Pi 


so as 0d's Sich ted Sie eee 


sas 


ae ~ : ¥ 
ves 
ne aoe Ira 


the ieee ioe rer ¢ 
the, Editor or the Annals 





ELIMINATION OF RANDOMIZATION IN CERTAIN STATISTICAL 
DECISION PROCEDURES AND ZERO-SUM 
TWO-PERSON GAMES' 


By A. Dvorerzxy,? A. Wap,’ ann J. Wo.irowirz® 


Institute for Numerical Analysis and Columbia University 


Summary. The general existence of minimax strategies and other important 
properties proved in the theory of statistical decision functions (e.g., [3]) and 
the theory of games (e.g., [5]) depends upon the convexity of the space of deci- 
sion functions and the convexity of the space of strategies. This convexity can 
be obtained by the use of randomized decision functions and mixed (randomized) 
strategies. In Section 2 of the present paper the authors state the extension (first 
announced in [1}) of a measure theoretical result known as Lyapunov’s theorem 
[2]. This result is applied in Section 3 to the statistical decision problem where 
the number of distributions and decisions is finite. It is proved that when the 
distributions are continuous (more generally, “‘atomless,’’ see footnote 7 below) 
randomization is unnecessary in the sense that every randomized decision func- 
tion can be replaced by an equivalent nonrandomized decision function. Section 
4 extends this result to the case when the decision space is compact. Section 5 
extends the results of Section 3 to the sequential case. Sections 6 and 7 show, 
by counterexamples, that the results of Section 3 cannot be extended to the 
case of infinitely many distributions without new restrictions.‘ Section 8 gives 
sufficient conditions for the elimination of randomization under maintenance of 
e-equivalence. Section 9 concludes with a restatement of the results in the 
language of the theory of games. 


1. Introduction. We shall consider the following statistical decision problem: 
Let x be the generic point in an n-dimensional Euclidean’ space R, and let Q 
be a given class of cumulative distribution functions F(z) in R. The cumulative 
distribution function F(x) of the vector chance variable X = (X,,--- , Xn) 
with range in RF is not known. It is known, however, that F is an element of the 
given class 2. There is also given a space D whose elements d represent the pos- 
sible decisions that can be made by the statistician in the problem under con- 
sideration. Let W(F, d, x) denote the “loss”? when F is the true distribution of 


1 The main results of this paper were announced without proof in an earlier publication 
[1] of the authors. 


2 On leave of absence from the Hebrew University, Jerusalem, Israel. 

* Research under a contract with the Office of Naval Research. 

* The impossibility of such an extension is related to the failure of Lyapunov’s theorem 
when infinitely many measures are considered. (cf. A. Lyapunov, “Sur les fonctions- 
vecteurs complétement additives,’’ Izvestiya Akad. Nauk SSSR. Ser. Mat., Vol. 10 (1946), 
pp. 277-279.) 

5 The restriction to a Euclidean space is not essential (see [{1)]). 


1 





2 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


X, the decision d is made and z is the ohserved value of X. We shall define the 
distance between two elements d, and d, of D by 


(1.1) p(d; , d2) = Sup | W(F, d,, x) — W(F, d&, 2) |. 
Fiz 


Let B be the smallest Borel field of subsets of D which contains all open subsets 
of D as elements. Let By be the totality of Borel sets of R. We shall assume that 
W(F, d, x) is bounded’ and, for every F, a function of d and x which is measurable 
(B X B)). By a decision function 6(z) we mean a function which associates with 
each x a probability measure on D defined for all elements of B. We shall oc- 
casionally use the symbol 6, instead of 5(z) when we want to emphasize that z 
is kept fixed. A decision function 6(x) is said to be nonrandomized if for every 
x the probability measure 6(x) assigns the probability one to a single point d of 
D. For any measurable subset D* of D (D* an element of B), the symbol 6(D* | z) 
will denote the probability measure of D* according to the set function 6(z). 
It will be assumed throughout this paper that for any given D* the function 
5(D* | x) is a Borel measurable function of x. The adoption of a decision function 
5(x) by the statistician means that he proceeds according to the following rule: 
Let x be the observed value of X. The element d of the space D is selected by 
an independent chance mechanism constructed in such a way that for any 
measurable subset D* of D the probability that the selected element d will be 
included in D* is equal to 6(D* | z). 

Given the sample point x and given that 5(x) is the decision function adopted, 
the expected value of the loss W(F, d, x) is given by 


(1.2) W*(F, 8,2) = / W(F, d, x) db. 
D 


The expected value of the loss W(F, d, xz) when F is the true distribution of X 
and 4(x) is the decision function adopted (but z is not known) is obviously equal 
to 


(1.3) r(F, 8) = [ W*(F, 8, x) dF(z). 


The above expression is called the risk when F is true and 6 is adopted. 
We shall say that the decision functions 6(z) and 5*(x) are equivalent if 
(1.4) r(F, 6*) = r(F, 8) for all F in Q. 


We shall say that 6(7) and 5*(x) are strongly equivalent if for every measurable 
subset D* of D we have 


(1.5) [ ori dF(2) = { 8*(D* |x) dF(z) _— forall F ing. 


6 The restriction of boundedness is not essential (see [1}). 





ELIMINATION OF RANDOMIZATION 3 


If 6 and 6* are strongly equivalent, they are equivalent for any loss function 
which is a function of F and d only. 


For any positive ¢, we shall say that 6(2) and 6*(x) are e-equivalent if 
(1.6) | r(F,6) — r(F,6*)| Se forall F inQ, 


and strongly ¢-equivalent if 


(1.7) [ so*\2) dF(z) — [ swe |2) dF(z)| <e 


for all measurable D* and for all F in Q. 

In Section 2 we state a measure-theoretical result first announced in [1] and 
proved in [6]. This result is then used in Section 3 to prove that for every decision 
function there exists an equivalent, as well as a strongly equivalent, nonrandom- 
ized decision function 6*, if 2 and D are finite and if each element F(x) of 2 
is atomless.’ This result is extended in Section 4 to the case where D is compact. 
Section 5 deals with the sequential case for which similar results are proved. 
A precise definition of a sequential decision function is given in Section 5. 

The finiteness of Q is essential for the validity of the results given in Sections 
2-5. The examples given in Section 6 show that even when Q is such a simple 
class as the class of all univariate normal distributions with unit variance, there 
exist decision functions 6 such that no equivalent nonrandomized decision func- 
tions exist. In Section 7, an example is given where a decision function 6 and a 
positive ¢ exist such that no nonrandomized decision function 6* is e-equivalent 
to 6. 

In Section 8, sufficient conditions are given which guarantee that for every 
6 and for every « > 0 there exists a nonrandomized decision function 6* which 
is e-equivalent to 6. 


2. A measure-theoretical result. Let {y} = Y be any space and let {S} = 5 
be a Borel field of subsets of Y. Let u.(S)(k = 1, --- , g) be a finite number of 
real-valued, o-finite and countably additive set functions defined for all S ¢ 5. 
The following theorem was stated by the authors [1]: 

TuEoreM 2.1. Let 5,(y) (j = 1, 2, +++, m) be real non-negative 5-measurable 
functions satisfying 


(2.1) 2 8,(y) = 1 


for all y « Y. Then if the set functions y,(S) are atomless there exists a decomposi- 
tion of Y into m disjoint subsets S,,---, Sm belonging to 5 having the property 

7A set function » defined on a Borel field S is called atomless if it has the following 
property: If for some S e 5, u(S). ¥ 0, then there exists an S’ C S such that S’ eS and 
such that u(S’) = u(S) and »w(S’) ~ 0. A cumulative distribution function is called atom- 
less if its associated set function is atomless. 





+t A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


that 


(2.2) [ow dus(y) = us(S)) fim Aeon ig ghee Ayn). 


If 83(y) = 1 for all y € S; and = O for any other yj = 1,---, m), then the 
above equation can be written as 


2.3) fs) du) = [ af) dual) 


This theorem is an extension of a result of A. Lyapunov [2] and is basic for 
deriving most of the results of the present paper. 


3. Elimination of randomization when Q and D are finite and each element 
F(x) of 2 is atomless. In this section we shall assume that Q consists of the 
elements F(x), --- , F,(x) and D of the elements d,, --- , dn . Moreover, we 
assume that F(x) is atomless fori = 1, --- , p. A decision function 6(x) is now 
given by a vector function 6(x) = [6:(x), --+ , 5n(x)] such that 


(3.1) 8(z)20, 82) =1 
j=l 


for all x « R. Here 6;(x) is the probability that the decision d; will be made when 
z is the observed value of X. The risk when F; is true and the decision function 
5(x) is adopted is now given by 


(3.2) r(Fs,8) = L | Ws, ds, 2)8,(2) dP(a). 
j=l VR 
A nonrandomized decision function 6*(x) is a vector function whose components 


8; (x) can take only the values 0 and 1 for all zx. 
For any measurable subset S of R let 


(33)  »,,(S) = [we.,a;,2) dF (2) oon eee ee eee 


Then the measures »;;(S) are finite, atomless and countably additive. Using 
these set functions, equation (3.2) can be written as 


(3.4) (Fi, 8) = f 8,2) doula). 
jml VR 


Replacing in Theorem 2.1 the space Y by R, the set of measures {yu ,--- , ue} 
by the set {v:;}(¢ = 1, --- , p;j = 1, +--+, m), it follows from Theorem 2.1 that 
there exists a nonrandomized decision function 6*(z) such that 


(3.5) L 6,(x) dv;;(x) = [ a@ dv,;(x) i TMes P;J = l, =e m). 


This immediately yields the following theorems: 





ELIMINATION OF RANDOMIZATION 5 


‘THEOREM 3.1. If 2 and D are finite and if each element F(x) of Q is atomless, 
then for any decision function 5(x) there exists an equivalent nonrandomized de- 
cision function 8*(x). 

Putting W(F, d, x) = 1 identically in F, d and z, equation (3.5) immediately 
yields the following theorem: 

TuHeoreM 3.2. If Q and D are finite and if each element F(x) of Q is aiomless, 
then for any decision function 5(x) there exists a strongly equivalent nonrandomized 
decision function 5*(x). 


4. Elimination of randomization when 0 is finite, D is compact and each 
element F(x) of 2 is atomless. Again, let Q = {F,,--- , F,} where the distri- 
butions F are atomless. If the loss W(F, d, x) does not depend on z, the finite- 
ness of 2 implies that D is at least conditionally compact with respect to the 
metric (1.1) (see Theorem 3.1 in {3]). We postulate that D is compact (but permit 
the loss to depend on x), and shall prove that if 5(x) is any decision function, 
there exists a nonrandomized decision function 6*(x) such that 6*(x) is equivalent 
to 6(x), i.e., 


(4.1) ri(5) _ r;(5*) (i - 1, ee P); 


where r;(6) stands for r(F;, 4). 

Since D is compact there exists an infinite sequence of decompositions of the 
space D into a finite number of disjoint nonempty measurable sets, the I de- 
composition to be C(1, 1, --- , 1), --+ , C(t, --+ , &) with the properties: 

(a) Any two sets C which have the same number of indices not all identical, 

are disjoint. 

(b) The sum of all sets with the same number / of indices is D (1 = 1, 2, --- 

ad inf.). 

(c) If the sequence of indices of one set C constitutes a proper initial part of 

the sequence of indices of another set C, the first set includes the second. 

(d) The diameters of all sets with J indices are bounded above by A(l) and 


lim A(1) = 0. 
1-+00 


Let 1 be fixed and define 
(4.2) Biaig,00 omg (Z) = 5[C(m, ae mi | x}. 


Define, furthermore, 


1 


~ Aran geoeng(S) C Omg ee++ mp) 
if Am,..«m,(Z) > 0, 
= 0 if Am,...m,(z) = 0. 


W da, Clim s° W(F;, d, 2) dé, 





6 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


Clearly, 


ki 


iA wbneraa [ W daz, Cm, «++, m)lAm,..m,(2) dF (2). 


mi=1 m,=1 
Considering a decision space D,; with elements dm,...m, (mi = 1,--:, ki; 
i = 1,--- ,2) and putting the loss W(F;, dm,...m;,2) = Wiz, C(m,--- ,m,)], 
equations (3.3) and (3.5) imply that there exists a finite sequence of measurable 
functions Aj,...m,(z) (m = 1, +++, ki3+++ 3m = 1,--- , k) such that 


(4.5) An,--.m,(z) = 0 or 1 for all z, 
(4.6) Do oe DAmyssm,(Z) = 1 for all z, 


my 


(4.7) Am;--m,(t) = 0 whenever An,...m,(x) = 0, 


and 


/ Wz, Clm, «++, m))Am,...m,(x) dF (2) 
ne. 


= / W fx, COm, «++, m)Am,...m, (x) dF (2). 
R 


Let now 6(x) be the decision function for which 
(4.9) 5[C(m, , +++ 5 mM) | zt] = Amy..+m,(X) 
and for any measurable subset D»,...m, of C(mi, «++ , mi) 


’ - ADi,.-0; 12) 
+. 0 Jun seemy | My+++™m = et aati Sieh es 
(4.10) [Dang-+my | #]Omy---mi (2) 8(C(mi, «++, m) | 2] 

5(Dmy.--m; | 2) 


where — é 
5[C(m, ++ +,m) |x] 


is defined to be = 0 when 4[C(m,, --+ ,m) | 2] = 0. 


It then follows from (4.4) and (4.8) that 
(4.11) ri(6) = r,(6). 


Applying the above result for 1 = 1, we conclude that there exists a decision 
function 5'(x) with the following properties: The choice among the C’s with 
one index is nonrandom. The decision, once given the C (with one index) chosen, 
is made according to 5(x). We have 4'[C(m) | z] = 0 whenever 6[C(m) | z] = 0 
and 


ri(6) = ri(6*) (i = 1,+--, p). 


Repeat the above procedure for every C with two indices, using 
Wiz, C(m, m2)} as weight function and '(x) as the decision function. We 





ELIMINATION OF RANDOMIZATION 7 


conclude that there exists a decision function 8°(z) with the following properties: 
The choice among the C’s with two indices is nonrandom. 8 [C(m , m2) | z] = 
whenever 8'[C(m,, m2) |x] = 0. The decision, once given the C (with two in- 
dices) chosen, is made according to 8'(x) and, therefore, in accordance with 
5(x). We have 


, 1,5 2 (m, = 1, 2; , ki) 
[ [ eo! Fenda) dizaF (x) = f [ ng” Ped 2) dB8 dra) re 


Repeat the above procedure for all C’s with / indices, 1 = 3, 4, --- ad inf. At 
the 1 stage we obtain a decision function 8'(x) with the following properties: 
The decision among the C’s with / indices is nonrandom. 8'[C(m, , --- , m:) | z] = 
0 whenever 8’ "[C(m,, --- , m)| 2] = 0. The decision, once given the chosen 
C with 1 indices, is made according to 5(7). We have 


W(F;,d, x) dé) dF (zx) 
a = i. eee »?P 
m = 1, eee » ki 


m1) = a, 28 


Hold x fixed and let C(x; 1) be that C with 1 indices for which 


/ dé, = 1. 
C(z;) 


Then C(x; 1+ 1) is a proper subset of C(x; 1) for every positive 1. The sequence 
C(x; 1), 1 = 1, 2, --- , determines, because D is compact, a unique limit point 
c(x) such that any neighborhood of c(x) contains almost all sets C(z; 1). Hence 
the sequence of probability measures 6:(1 = 1, 2, --- , ad inf.) converges to a 
limit probability measure 6; which assigns probability one to any measurable 
set which contains the point c(x). Since W(F; , d, x) is continuous in d, we have 


(4.12) lim f $(F.,d, 2) dos = [ wed, 2 at 
=o D 
for any 2. 

Now let z vary over R. It follows from (4.12) and the boundedness 
of W(F,d, x) that lim: r(6') = r,(8*). Since r;(8') = r,(8), also r(8*) = r;(8) 
(¢ = 1,---, p). Thus the probability measures 6*(x) constitute the desired 
nonrandomized decision function. 

It remains to show that for any measurable subset D* of D, the function 
5*(D* | x) is a measurable function of z. The measurability of 6*(D*|z) can 
easily be shown for any D*, if it is shown for all closed sets D*, since every 
measurable set can be attained by a denumerable number of Borel operations 
(denumerably infinite sums and complements) starting with closed sets. Thus 





8 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


we shall assume that D* is closed. For any positive p let D} be the sum of all 
open spheres with center in D* and radius p. It is easy to see that 


8*(D3, |x) = liminf 5'(D¥ | x) = 8*(D* | z). 
l=eo 


Since lim 4*(D3, | x) = 6*(D* | x), it follows from the above relation that 
p=0 


lim liminf 8'(D¥ | x) = 8*(D* | 2). 

p=0 l 
Since 8'(D? | x) is a measurable function of z, the measurability of 5*(D* | x) is 
proved. 


5. Elimination of randomization in the sequential case. In this section we shall 
consider the following sequential decision problem: Let X = {X,} 
(n = 1, 2, --- , ad inf.) be a sequence of chance variables. Let x be the generic 
point in the space R of all infinite sequences of real numbers, i.e., x = {xn} 
(n = 1, 2,--- , ad inf.) where each z, is a real number. It is known that the 
distribution function F(x) of X is an element of 2, where © consists of a finite 
number of distribution functions F(x), --- , F,(#), and that the distribution 
function of X, is continuous according to F;(x), i = 1, --+ , p. The statistician 
is assumed to have a choice of a finite number of (terminal) decisions d, , --- , 
dm, i.e., the space D consists of the elements d; , d.,--- , dn . A decision rule 
5 is now given by a sequence of nonnegative, measurable functions 5,;(1 , «++ , 2+) 
(v = 0,1,---,m;t = 1, 2,---, ad inf.) satisfying 


(5.1) 2 n(n 2% » %) = 1 


for —2 < %,°+:,2, < «. The decision rule 6 is defined in terms of the 
functions 6,; as follows: After the value 2, of X, has been observed, the sta- 
tistician decides either to continue experimentation and take another observa- 
tion, or to stop further experimentation and adopt a terminal decision 
dij = 1,--:, m) with the respective probabilities 5n(7) and 6,(x) 
(j = 1, ---, m). If it is decided to continue experimentation, a value x, of X» 
is observed and it is again decided either to take a further observation or adopt 
a terminal decision d;(j = 1, --- , m) with the respective probabilities 5o2(x: , x2) 
and 6(21 , 22)(j7 = 1,---, m), ete. The decision rule is called nonrandomized 
if each 6,, can take only the values 0 and 1. 

Let vivi(ai1, --* , Ze) represent the sum of the loss and the cost of experi- 
mentation when F; is true, the terminal decision d, is made and experimenta- 
tion is terminated with the t™ observation 


(» = 1,2,-+-,m;i = 1,+++, p;t = 1, 2, --- , ad inf.). 


The functions v;,:(7; , «++ , x) are assumed to be finite, nonnegative and meas- 
urable. We shall consider only decision rules 6 for which the probability is one 
that experimentation will be terminated at some finite stage. The risk (ex- 





ELIMINATION OF RANDOMIZATION 9 


pected loss plus expected cost of experimentation) when F; is true and the rule 
6 is adopted is then given by 
ri(6) = oe >| Vive(Xr es 1) 5o1 (21) 502(a1 , 2) +++ bo-1) (x1, res Ty-1) 
(5.2) te] pel Ry 
° Syi(t1 , 7) dF (x1, er »%), 

where FR, is the ¢-dimensional space of x, --+ , 2, and F;,(z,,--- , 2) is the 
cumulative distribution function of X,,---, X, when F, is the distribution 
function of X. 

We shall say that the decision rules é and 3° are equivalent if r,(6') = r,(8°) 
fori = 1, --- , p. We shall say that 5’ and & are strongly equivalent if 


Viol, °° * , Ldbor(a) °° 861-1) (11, vee, Deadbeat, +++, 2) Fu 
Re 


(5.3) 


: VinelX1, +++ 4 LeSon(tr) +++ Soe (i, + , LeaOre(ti, +++, 2d AF it 
t 
fori = 1,2,---,p;v = 1,-°-+,mandt = 1, 2, --- , ad inf. 

Clearly, if 5' and 5 are strongly equivalent and if the functions vi(21 , «++ , 24) 
reduce to constants v;,,, then 6' and 3° are equivalent for all possible choices of 
the constants v;,; . 

Let 


gilt, 6) = 
(5.4) 


me ™m 


z. Vire(X1, ear, 24) 80n (21) ee Soc (21, vee » Za) b(t, oe 2). 
1 


t=] p= 


We shall prove the following lemma: 

Lemma 5.1. Let 6 be a decision rule for which ¢;(z, 6) < © for all x, except 
perhaps on a set of x’s whose probability is zero according to every distribution 
function F(x)(i = 1,--: , p). Let + and T be given positive integers. Then there 
exists a decision function 5 with the following properties: 


(5.5) b(t, **+,2%e) = 0 or 1, 2X bla, +++, 2) = 1, 


for every point in Rv = 0, 1,+++, m), ‘ 


(5.6) bt (11, oe x.) = S(t, Aes » 2) (v 1s 0, l, aoe ,m;t#r), 
(5.7) r(5) = r;(6) (Gi = 1, +++, p), 


(5.8) / Viredor * °° 60 (t~1)5ve dF xz o- [ Viredut "2 5o¢e-1) doe dF i 
Re t 


(v= 1,--+,m;t =1,---, 7), 
(5.9) gi(z, 5) < @, 





10 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


for all x except perhaps on a set whose probability is zero according to every dis- 
tribution F,(x)(i = 1,--- , p). 
Proor. We can write ¢;(z, 5) as follows: 


t—1 ™m 
¢i(z, 5) = } # > Vivt(Xa , ee Lda Fe 504-1) 5ve 


tel pl 


(5.10) pee i 
+ DD giorslts, +++, Td Ser, 


tr y= 


where Girr:(%1 , *-* , £4) does not depend on dp, , 4:1, , «-* , dm, . The first double 
sum reduces to zero when r = 1. Clearly, if a 5 with the desired properties exists, 
then 


rl =m 
¢i(z, 5) = P. Zz Vive (21 pty Lebar +++ Goce Sve 


t=1 yp=l 


(5.11) 


t+ >) D> Pierdltry °** Baber. 


tar ym 


For any subset S of R, let 


(5.12) Hiort GS) = | Jive (21, Too 7) dF ; (t = 7, F + Rs ie T), 


and 


(5.13) win(S) = | 


“8 


| D> Girt(ti, «*- x) | IF ;. 


teT+1 
The measures yj»;1 are not defined if s > 7. Clearly, the measures 
Hord(y = 0,1,---,mt=7,7+1,---,T) 
and the measures yi»-(y = 1, --- , m) are nonnegative, countably additive and 
o-finite. Since for any x for which ¢;,(z, 6) < © and &, > 0, the sum 


« 
z Jiort(%i, °° Le) 3 o, 
t=T+1 
it follows from the assumptions of Lemma 5.1 that yj, is o-finite over the space 
R’ consisting of all x for which &, > 0. Of course, uo, is nonnegative and count- 
ably additive. Let R’’ be the set of all points x for which 4, = 0. We put 


(5.14) bo(t1,°**, 2%) = 0 forall zinR”. 


Application of Theorem 2.1 to each of the spaces R’ and R”’ shows that there 
exist measurable functions 6,,(z1 , --- , 2;)(v = 0,1, --- , m) such that in addi- 
ion to (5.14) the following conditions hold: 


(5.15) 5, =0 or 1 =0,1,---m) and )>5,, =1 forallz, 


vy=n0 





ELIMINATION OF RANDOMIZATION 


* [ by Apivrt = [ 5. Apiore 
(5.16) R R 


1,-+-,p;vy =0,1,-++ m;t=7,7+1,---, 7), 


G= 
(5.17) fb dun = | bn dus G = 1; ++, pj» = 0,1, «++ m). 
R R 


Lemma 5.1 is a simple consequence of the equations (5.14)—(5.17). 

For any positive integer u, we shall say that a decision rule 4 is truncated at 
the u stage if 59. = O for u’ = u identically in z. 

TueoremM 5.1. If 6 ts truncated at the u™ stage there exists a nonrandomized 
decision rule 6* that is strongly equivalent to 6. 

Proor. It is sufficient to prove Theorem 5.1 in the case where 4,, = 0 for 
t > uand »v # 1 and 5, = 1 fort > wu. Clearly, ¢,(z, 5) < © for all x. Putting 
t = land T = u in Lemma 5.1, this lemma implies the existence of a decision 
rule 5' with the following properties: (a) 3° is strongly equivalent to 4; (b) 6}, = 0 
or 1 (v = 0, 1, --- , m); (c) 83, = 6. for vy = 0,1, ---, m and¢ > 1. Applying 
Lemma 5.1 to &' and putting r = 2 and T = u, we see that there exists a de- 
cision rule & with the following properties: (a) 8° is strongly equivalent to 4’; 
(b) 62, = O or 1 (» = 0,1, --- , m); (c) 63, = 83, for » = 0,1,--- , mandé # 2. 
Continuing this procedure, at the u* step we obtain a decision rule 5” that is 
nonrandomized and is strongly equivalent to all the preceding ones. This proves 
our theorem. 

We shall say that two decision rules 3’ and 6 are strongly equivalent up to 
the 7 stage if 


1 1 ea 
[ Vire(t1, ***  Le)G0r + ** Soceay Soe AF ie 
Rt 


5.18 ° . 
(5.18) Be / Vir(Tr, +2 +, On °° + OoceaSer AF ge 
Re 


for t=1,---, pj» =1,+++,m and t=1,---,T. 


Furthermore, we shall say that a decision rule 6 is nonrandomized up to the 
stage T if 6,, = 0 or 1 forvy = 0,1,---,mandé = 1,---, 7. 

We now prove the following theorem. 

THeEoREM 5.2. If 6 is a decision rule for which ¢;(x, 6) < «©, except perhaps on 
a set of x’s of probability zero according to every F(x)(i = 1, +++, p), then there 
exists a nonrandomized decision rule 6* that is equivalent to 6. 

Proor. Let {e:} and {n;}(¢ = 1, 2, --- , ad inf.) be two sequences of positive 
numbers such that lime; = 0 and lim; = o. Let 7; be a positive integer 


120 t=—oo 


such that 


™ 


(5.19) r0)- >> 


t=] yl 


[ Virr(T1, ers, Leda oe Socenty 5y¢ AF ig <a if r;(6) < , 
Re 





12 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


and 


Ti ™ 
(5.20) » a | Vive(Tr, ster » Xba ++ Soca 52 Fig > m if r;(6) = SS, 
ml ook SR, 

Let 5° be a decision rule such that ¢,(z, 6) < © (except perhaps on a set of 
probability measure zero); 5' is equivalent to 6; 5 is strongly equivalent to 6 
up to the 7," stage; 5' is nonrandomized up to the 7, stage and 6}, = 4,, for 
t > T, . The existence of such a decision rule follows from a repeated applica- 
tion of Lemma 5.1. In general, after 5’, --- , 6’? and T;, --- , T; are given, let 
5’** be a decision rule such that ¢,(z, 8’*’) < « (except perhaps on a set of 
probability measure zero); 5’*’ is equivalent to 4’; 6’” is strongly equivalent 
to 5’ up to the 7';,.% stage, where 7';,; is a positive integer chosen so that Tj, > 
T; and (5.19) and (5.20) hold with 6 replaced by 6’, « replaced by ¢€;,: and m 
replaced by 7;4: ; &’*’ is nonrandomized up to the stage Tj. ; 627° = 6), for 
t < T; and 8&{' = 4,, for t > Tj4,. The existence of such a decision rule 
8’** follows again from a repeated application of Lemma 5.1. 

Let 5* be the decision rule given by the equations 


(5.21) See = Oe (» = 0,1, -+-,m;t = 1,2,--+, ad inf). 


It follows easily from the above stated properties of the decision rules 6’ 
(j = 1,2, --- , ad inf.) that 6* is nonrandomized and r;(6*) = r;(6)(¢ = 1, --- , p). 
This completes the proof of Theorem 5.2. 


6. Examples where admissible decision functions do not admit equivalent 
nonrandomized decision functions. In this section we shall construct examples 
which show that there exist admissible decision functions 6(x) which do not ad- 
mit equivalent nonrandomized decision functions 6*(xr). 

EXamPLeE 1. Let X be a normally distributed chance variable with unknown 
mean @ and variance unity. This means that © is the totality of all univariate 
normal distributions with unit variance. Suppose we wish to test the hypothesis 
H, that the true mean @ is rational on the basis of a single observation x on X. 
Thus, D consists of two elements d; and d, where d; is the decision to accept 
Hy and d, is the decision to reject Hy. For any decision function 6(x), let 6;(x) 
denote the value of 6(d, |x). Let the loss be zero when a correct decision is 
made, and the loss be one when a wrong decision is made. Then the risk when 
6 is the true mean and the decision function 6(x) is adopted is given by 

oo 


1 : : 
(6.1) ~ : e Me? 5 (x) dx when @ is irrational, 
T 


«2 


l 2 
(6.2) r(6,5) = = [ ete (1 — &:(2)) dx when 6 is rational. 
aT J 


4 


8 A decision function with risk function r(F) is called admissib!e if there exists no other 
decision function with risk function r’(F) such that r’(F) S r(F) for every F ¢€Q, and the 
inequality sign holds for at least one F e Q. 





ELIMINATION OF RANDOMIZATION 


Let 83(x) = } for all x. Clearly, 
(6.3) r(6, 8°) = 4 
for all 6. We shall now show that 6°(x) is an admissible decision function. For 
suppose that there exists a decision function 8’(z) such that 
(6.4) r(6, 8’) S r(6, 8°) = 3 
for all 6, and 
(6.5) r(:, 8’) < r(, 8) = 4 


for some value 6, . Suppose first that 6, is rational. Since the integrals in (6.1) 
and (6.2) are continuous functions of @, for an irrational value 6, sufficiently 
near to 6, we shall have r(@, , 6’) > 4 which contradicts (6.4). Thus, 6, cannot 
be rational. In a similar way, one can show that 6, cannot be irrational. Hence, 
the assumption that a decision function 6’(x) satisfying (6.4) and (6.5) exists 
leads to a edntradiction and the admissibility of 8°(x) is proved. 

Let now 6*(x) be any decision function for which 


(6.6) r(0, 8*) = r(8, 8°) 
for all 6. Now (6.6) implies that 


(6.7) ae f. ete” (5. (x) — 88 (x)) dx = 0 


identically in 6. Since 6,(x) — 81(x) is a bounded function of z, it follows from 
the uniqueness properties of the Laplace transform that (6.7) can hold only if 
8:(x) — 63(x) = 0 except perhaps on a set of measure zero. Hence, no nonran- 
domized decision function 6*(x) can satisfy (6.6). 

In the above example, the distributions consistent with the hypothesis Hy 
which is to be tested (normal distributions with rational means) are not well 
separated from the alternative distributions (normal distributions with ir- 
rational means). One might think that this is perhaps the reason for the existence 
of an admissible decision function 6° such that no nonrandomized decision func- 
tion 6* can have as good a risk function as 8° has. That this need not be so, is 
shown by the following: 

EXAMPLE 2. Suppose that X is a normally distributed chance variable with 
mean @ and variance unity. The value of @ is unknown. It is known, however, 
that the true value of @ is contained in the union of the two intervals [—2, —1] 
and [1, 2]. Suppose that we want to test the hypothesis that @ is contained in 
the interval [—2, —1] on the basis of a single observation x on X. Suppose, 
furthermore, that the chance variable X itself is not observable and only the 
chance variable Y = f(X) can be observed where f(z) = x when |z| < 1, 
and = |x| when |z| 2 1. Let the loss be zero when a correct decision is made, 
and one when a wrong decision is made. For any decision function 8(y), let 





14 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


5:(y) denote the value of 6(d; | y) where d, denotes the decision to accept Hg. 
Let 5°(y) be the following decision function: 


di(y) = 1 when —-1 <y <0 

(6.8) =0 when OS 
2 
2 


ee 2 


when y21., 


First we shall show that 4°(y) is an admissible decision function. For this 
purpose, consider the following probability density function g(@) in the param- 
eter space: g(#) = 4 when —2 S 6S —lorl S 6S 2, = 0 for all other #6. 
If we interpret g(@) as the a priori probability distribution of 6, the a posteriori 
probability of the @-interval [—2, —1] is greater (less) than the a posteriori 
probability of the 6-interval [1, 2] when —1 < y <0 (0 < y < 1), and thea 
posteriori probabilities of the two intervals are equal to each other when y = 0 
or y = 1. Hence, 4°(y) is a Bayes solution relative to the a priori distribution 
g(@), i.e., 


—1 2 —\ 2 
(6.9) [ r(@, 8°) de + | r(6, 8°) dé < r r(@, 5) dé + [re 5) dé 


for any decision function 5. Suppose now that 4 is a decision function for which 
r(@, 8) S r(6, 8°) for all @. It then follows from (6.9) that r(@, 5) < r(@, 6°) can 
hold at most on a set of 6’s of measure zero. Since, as can easily be verified, 
r(8, 6) and r(@, 8°) are continuous functions of 8, it follows that r(@, 5) = r(@, 8°) 
everywhere and the admissibility of 8° is proved. 

Let now 8’(y) be any decision function for which r(@, 8’) = r(@, 8°) for all 6, 
i.€., 


l - 
(6.10) Tk [ ee) — 8/(y)] dz = 0 for all 8. 


Since 8{(y) — 8;(y) is a bounded function of z, it follows from the uniqueness 
properties of the Laplace transform that (6.10) can hold only if 4}(y) = 8:(y) 
except perhaps on a set of measure zero. Thus, no nonrandomized decision 
function 6* exists such that r(@, 5*) = r(@, 8°) for all 6. 

7. Compactness of 2 in the ordinary sense is not sufficient for the existence 
of €-equivalent nonrandomized decision functions. Let 2 = {F} be the totality 
of density functions’ on the interval 0 S x S 1 for which F(x) S c for every z, 
where c is some positive constant greater than 2. The sample space will be the 
interval 0 S x S 1. We shall say that the sequence F,, F:, --- converges 
to F if 


z z 
lim / F,(y) dy = / F(y) dy 
no — 3 et) 

* Here F(x) denotes a density function. This represents a change in notation from pre- 
ceding sections. 





ELIMINATION OF RANDOMIZATION 15 


for every real x. The set 2 is compact in the sense of the above convergence 
definition.” Let A be a fixed interval a; S + S a, where 0 < a; < a < 1. Let 
D = {d,, d,} and define W as follows: 


W(F, da) + W(F, ad) = 1, 
W(F, dh) = Oorl 


according as the probability of A under F is rational or not. For any decision 
function 5(x), let 5,(2) denote the probability assigned to d,; by 4(z), i.e., 6(z) = 
5(d, | x). 

Let 8’(x) be the decision function for which 8;(r) = 3. We shall prove that 
6’(z) is an admissible decision function. For suppose there exists a decision 
function 8°(x) such that 


(7.1) r(F, &) S r(F, 8) = 4 

for every F, and for Fy we have 

(7.2) r(Fo, 8) < r(Fo, 8’). 

Now, if F; — Fy and W(F; , d;) = W(Fo, d,) for every i, then r(F; , 5) > r(Fo , 8) 
for every decision function 6(x), and, in particular, for 8°(z). If F; — Fo and 
W(F;,d:) + W(Fo, d:) = 1 for every i, then r(F; , 5) — 1 — r(Fo, 4) for every 
decision function 6(x) and, in particular, for 3°(x). Clearly, we can construct 


two sequences of functions F such that each sequence converges to Fy, the 
probability of A according to every member of the first sequence is rational, 


and the probability of A according to every member of the second sequence is 
irrational. Because of (7.2) it follows that inequality (7.1) will be violated for 
almost every member of one of these two sequences. Hence 4’ is admissible. 

Let us now prove that there cannot exist a nonrandomized decision function 
6*(x) such that 


(7.3) r(F, *) Sr(F, 8) +4= 7 


for every F ¢ 2. Suppose there were such a decision function 6*(x). Let H be 
the set of x’s where 6;(z) = 1, and let A be the complement of H with respect 
to the interval [0, 1]. If H is a set of measure zero or one then obviously (7.3) 
is violated for some F. Thus, it is sufficient to consider the case when H is a 
set of positive measure a < 1. Suppose for a moment that a > 4. Let G be 
the density which is zero on A and constant on H. From (7.3) it follows that 
P{A |G} is rational. There exists a density G’ e 2 such that P{H|G’} > } 
and P{A | G’} is irrational. But then (7.3) is violated for G’. If a S 3, let G 
be the density which is zero on H and constant on A. From (7.3) it follows 
that P{A | G} is irrational. There exists a density G’ « @ such that P{H | G’} > 


10 The cumulative distribution functions are well-known to be compact in the usual 
convergence sense. Since the densities are bounded above the limit cumulative distribution 
function must be absolutely continuous. 





16 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


3 and P{A | G’} is rational. But then (7.3) is violated for G’. Thus (7.3) can 
never hold for every F ¢ Q2 and the desired result is proved. 


8. Sufficient conditions for the existence of ¢e-equivalent nonrandomized 
decision functions. In this section we shall consider the nonsequential decision 
problem (as described in the introduction), and we shall give sufficient conditions 
for the existence of ¢-equivalent nonrandomized decision functions. We shall 
consider the following four metrics in the space 2: 


(8.1) o(F; , F:) = Sup | [ ar, " [ ar, 
8 8 8 


when S is any measurable subset of R, 


(8.2) po( PF; ’ F:) me Sup | W(Fi ? d, x) — W(F; ’ d, x) I, 
d,z 


(8.3) ps(F,, Fs) = pr(Pi, F2) + m(Fi, Fe), 
(8.4) e(F,, F2) = Sup | r(Fy, 6) — r(F2, 6) |. 


First we prove the following lemma: 

Lemma 8.1. If Q is conditionally compact in the sense of the metric p; , then it 
is conditionally compact in the sense of the metric ps . 

Proor. Let {F;}(i = 1, 2,--- , ad inf.) be a Cauchy sequence in the sense 
of the metric pz, i.e., 
(8.5) lim ps(F;, F;) = 0. 

i,j=meo 

It follows from (8.5) and (8.3) that W(F;, d, x) converges, asi — ©, toa 

limit function W(d, x) uniformly in d and 2, i.e., 


(8.6) lim W(F;, d, xz) = W(d, z) 


uniformly in d and z. Hence 


(8.7) — / W(F;, d, x) dbp = [ W(d, x) di, 
D D 


=x 


uniformly in z and 6. Because of (8.5), we have 


(8.8) lim p(F;, F;) = 0. 


t,j=—eo 


Hence there exists a distribution function Fo(z) (not necessarily an element of 
©) such that 


(8.9) lim p(Fs, Fo) = 0. 





ELIMINATION OF RANDOMIZATION 


It follows from (8.7) and (8.9) that 


(8.10) lim [ l [ W(F,, d, 2) as dF (x) = [ l [ W(d, 2) a | dF,(z) 


uniformly in 6. Hence {F;} is a Cauchy sequence in the sense of the metric p, 
and Lemma 8.1 is proved. 

Next we prove 

Lemma 8.2. If D is conditionally compact in the sense of the metric (1.1) and 
if 6 is any decision function, then for any « > 0 there exists a finite subset D‘ of 
D and a decision function 8° such that 8'(D'| x) = 1 identically in x and &' is 
e-equivalent to 6. 

Proor. Since D is conditionally compact, it is possible to decompose D into 
a finite number of disjoint subsets D, , --- , D, such that the diameter of D; 
is less then ej = 1,---, u). Let d; be an arbitrary but fixed point 
of D;(j = 1, +++, u) and let 6'(x) be the decision function determined by the 
condition 


(8.11) 8'(d;| 2) = 8(D;| z) he 
Clearly 


| 
(8.12) | [ W(F, d, x) db, — / W(F, d, x) déi| S 


for all F and xz. Hence, 
(8.13) |r(F, 8) — r(F,8)| Se 


for all F and our lemma is proved. 

We are now in a position to prove the main theorem. 

THEOREM 8.1. If the elements F(x) of Q are atomless, if 2 is conditionally com- 
pact in the sense of the metrics p, and p, , and if D is conditionally compact in the 
the sense of the metric (1.1), then for any « > 0 and for any decision function 5(z) 
there exists an e-equivalent nonrandomized decision function 6*(z). 

Proor. Because of Lemma 8.2, it is sufficient to prove our theorem for finite 
D. Thus, we shall assume that D consists of the elements d,,---, d,. It is 
easy to verify that conditional compactness of 2 in the sense of both metrics 
p, and p. implies conditional compactness in the sense of the metric p;, and 
because of Lemma 8.1, also in the sense of the metric p,. Thus, conditional 
compactness of 2 in the sense of the metrics p, and p, implies the existence of a 
finite subset O* = {F,,--- , Fx} of Q@ such that 0* is ¢/2-dense in Q in the sense 
of the metric p, . Let 6* be a nonrandomized decision function that is equivalent 
to 6 if 2 is replaced by 2*. The existence of such a 4* follows from Theorem 
3.1. Since 2* is e/2-dense in 2 (in the sense of the metric p,), we have 


(8.14) | r(F, 6*) — r(F,8)| Se forall Find 


and our theorem is proved. 





18 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


We shall now introduce some notions with the help of which we shall be able 
to strengthen Theorem 3.1. For any measurable subset S of R, let 


(8.15) r(F,8|S) = [ | [ W(F, d, z) a, dF (2). 


We shall refer to the above expression as the contribution of the set S to the 
risk. For any S we shall consider the following four metrics in Q: 


(8.16) pu(F:, F:) = Sup| f ar, — [dF 
s | “se 8 


where S* is any measurable subset of S, 


(8.17) pos(F:, F2) = Sup | W(Fi,d,2) — W(F:,d,2)|, 
d,ze8 


(8.18) pas(F1, F2) = pis(Fi, F2) + pes(Fi, F2), 
(8.19) pas(F,, F2) = Sup | r(Fi, 6 |S) — r(F2,6| 8) |. 
3 


Finally let the metric ps(d; , d:) in D be defined by 


(8.20) ps(d; , d2) = Sup | W(F,d,,2) — W(F, d:,x)|. 
F ze 

We shall now prove the following stronger theorem: 

THEOREM 8.2. Let all elements F of Q be atomless. If there exists a decomposi- 
tion of R into a sequence {R;}(i = 1, 2, --- , ad inf.) of disjoint subsets such that 
Q is conditionally compact in the sense of the metrics pir; and per; for each i, and 
such that D is conditionally compact in the sense of the metric pr, for each 7, then 
for any « > 0 and for any decision function 6 there exists an ¢-equivalent non- 
randomized decision function 8*. 

Proor. Let {R;} be a decomposition of R for which the conditions of our 
theorem are fulfilled. Let {¢;} be a sequence of positive numbers such that 
ea e; = €. Let &'(x) be a decision function such that 6,(z) = 6(x) for any x 
not in R,, 6'(x) is nonrandomized over R; (for any z in R,, 6'(x) assigns the 
probability one to a single point d in D) and such that 


(8.21) \r(F,6| Ri) — r(F,8|R:)| Sa for all F. 


The existence of such a decision function 5' follows from T heorem 8.1 (replacing 
R by R;). After 5’, --- , 6°" have been defined (i => 1), let 5° be a decision func- 


i-1 


tion such that 4° is nonrandomized over R’, 6‘(x) = 6° "(z) for all x in U R;, 
j=l 


8‘(x) = 8(x) for all x in R — U R; and such that 
j=l 


(8.22) \r(F, 8° |R,) —r(F,6|R)| Se forall Fina. 





ELIMINATION OF RANDOMIZATION 19 


The existence of such a decision function 8° follows again from Theorem 8.1. 
Clearly 5‘(x) converges to a limit 6*(x), as i + ©. This limit decision function 
6*(x) is obviously nonrandomized and satisfies the conditon 


(8.23) |r(F,8|R,) — r(F, &*| R)| S « 


for all i and F. Theorem 8.2 is an immediate consequence of this. 

The conditions of Theorem 8.2 will be fulfilled for a wide class of statistical 
decision problems. For example, this is true for the decision problems which 
satisfy the following six conditions: 

ConpiTion 1. The sample space R is a finite dimensional Euclidean space. 
All elements F(x) of 2 are absolutely continuous. 

ConpDITION 2. 2 admits a parametric representation, t.e., each element F of Q 
is associated with a parametric point @ in a finite dimensional Euclidean space E. 

We shall denote the density function p(x) corresponding to the parameter 
point 6 by p(a, 6). . 

ConpiTIon 3. The set of parameter points 0 which correspond to all elements F 
of 2 is a closed subset of E. 

We shall call this set of all parameter points @ the parameter space. Since 
there is a one-to-one correspondence between the elements F of 2 and the points 
6 of the parameter space, there is no danger of confusion if we denote the param- 
eter space also by Q. 

ConpiTion 4. The density function p(x, 0) is continuous in @ € Q for every zx. 

ConpiTIon 5. The loss W(6, d) when @ is true and the decision d is made does 
not depend on x. D is conditionally compact in the sense of the metric p(d; , d.) = 
Sup | W(0, d:) — W(@, d2) |. 


ConpiTI0n 6. For any bounded subset M of R, we have lim p(x, 0) dx = 0. 
{\Wiar} 


We shall now show that the conditions of Theorem 8.2 are fulfilled for any 
decision problem that satisfies Conditions 1-6. Let S; be the sphere in R with 
i-1 


center at the origin and radius 7. Let R; = S; and R; = S; — U Ri = 1, 
j=l 

2, ---, ad inf.). Condition 5 implies that D is conditionally compact in the 

sense of the metric pz, for all 7. It follows from Condition 5 and Theorem 2.1 

in [3] that 2 is conditionally compact in the sense of the metric p(@,, 62) = 

Sup | W(@,,d) — W(@,d) |. Hence @ is conditionally compact in the sense of 

d 


the metric pr; for each 7. It remains to be shown that © is conditionally compact 

in the sense of the metric pie; for each 7. For this purpose, consider any se- 

quence {6;}(j = 1, 2,---, ad inf.) of parameter points. There are 2 cases possible: 

(a) {0;} admits a subsequence that converges in the Euclidean sense to a finite 

point % ; (b) lim | @;| = «©. Let us consider first the case (a) and let {65} be a 
i) 


subsequence of {@;} which converges to a finite point % . It then follows from 
Condition 4 and a theorem of Robbins [4] that {0;} is a Cauchy subsequence 





20 A. DVORETZKY, A. WALD, AND J. WOLFOWITZ 


in the sense of the metric pix; for each 7. In case (b), Condition 6 implies that 
the sequence {@;} is a Cauchy sequence in the sense of the metric pie, for each 7. 
Thus, © is conditionally compact in the sense of the metric pir; . This completes 
the proof of our assertion that a decision problem that satisfies Conditions 1-6, 
satisfies also the conditions of Theorem 8.2. 


9. Application to the theory of games. Translation of the results of Section 2 
into the language of the theory of games is immediate and we shall do this only 
very briefly. The function W(F;, d;, xz) (@ = 1,--+,p;j =1,°++, m; x R), 
of Section 1 is now called the pay-off function of a zero-sum two-person game. 
The game is played as follows: Player I selects one of the integers 1, --- , p, 
say 7, without communicating his choice to player II. A random observation 
x ¢ R ona chance variable whose distribution function is F; is obtained and 
communicated to player IJ. The latter chooses one of the integers 1, --- , m, 
say j. The game now ends with the receipt by player I and player II of the 
respective sums W(F,;, d;, x) and —W(F;, d;, x). Randomized (mixed) and 
nonrandomized (pure) strategies are defined in the same manner as the cor- 
responding decision functions in Section 1. When the distribution functions 
F,(xz)(t = 1,--+, p) are all atomless the obvious analogues of Theorems 3.1 
and 3.2 hold. 

It should be remarked that the usual definition of randomized (mixed) strat- 
egy is not as general as the one given above. In the usual definition player II 
chooses, by a random mechanism independent of the random mechanism which 
yields the point x, some one of a (usually finite) number of nonrandomized 
(pure) strategies, and then plays the game according to the nonrandomized 
strategy selected. In our definition (used in [3]) the random choice is allowed 
to depend on z. Clearly our method of randomization includes the usual one as 
a special case. The relation between the two methods of randomization will be 
discussed by two of the authors in a forthcoming paper [7]. 

Suppose that the number of possible decisions is at most denumerable, and 
that the decision procedure consists in choosing at random and in advance of 
the observations, one of a finite number of nonrandomized decision functions. 
The sample space can be divided into an at most denumerable number of sets 
in each of which only a finite number of decisions is possible (the possible de- 
cisions vary from set to set). In each set our results are applicable. Since the 
number of sets is denumerable the resultant decision function is measurable. 
We conclude: It follows from our results that if a decision procedure consists of 
selecting with preassigned probabilities one of a finite number of nonrandomized 
decision functions with the number of possible decisions at most denumerably 
infinite, and if the possible distributions are finite in number and atomless, then 
there exists an equivalent nonrandomized decision function. More general 
results can be obtained for this case (where one chooses at random and in ad- 
vance of the observations, one of a finite number of nonrandomized decision 
functions). By application of the methods of Sections 4 and 8 the requirement 





ELIMINATION OF RANDOMIZATION 21 


that the number of possible decisions be denumerable can be easily removed. 
The procedures are straightforward and we omit them, 


REFERENCES 

{1] A. Dvorerzxy, A. WaLb, anp J. WoLrow!17z, “Elimination of randomization in certain 
problems of statistics and of the theory of games,’’ Proc. Nat. Acad. Sci., Vol. 36 
(1950), pp. 256-260. 

[2] A. Lyapunov, “Sur les fonctions-vecteurs complétement additives,’ Izvestiya Akad. 
Nauk SSSR. Ser. Mat., Vol. 4 (1940), pp. 465-478. 

[3] A. Waxp, Statistical Decision Functions, John Wiley & Sons, 1950. 

|4] Hersert Ropsins, ‘‘Convergence of distributions,’”’ Annals of Math. Stat., Vol. 19 
(1948), pp. 72-75. 

[5] J. von NEUMANN AND QO. MorGEensterN, Theory of Games and Economic Behavior, 
Princeton University Press, 1944. 

[6] A. Dvorerzxy, A. Wap, anp J. Wo.rowirTz, ‘Relations among certain ranges of 
vector measures,’’ Pacific Journal of Mathematics (1951). 

[7] A. Wap anv J. Wotrowi1tTz, ‘“T'wo methods of randomization in statistics and the 
theory of games,” Annals of Mathematics, to be published. 





ON MINIMAX STATISTICAL DECISION PROCEDURES 
AND THEIR ADMISSIBILITY! 


By Coun R. Biytu 
University of California, Berkeley 


Summary. This paper is concerned with the problem of making a decision on 
the basis of a sequence of observations on a random variable. Two loss functions, 
each depending on the distribution of the random variable, the number of ob- 
servations taken, and the decision made, are assumed given. Minimax problems 
can be stated for weighted sums of the two loss functions, or for either one sub- 
ject to an upper bound on the expectation of the other. Under suitable conditions 
it is shown that solutions of the first type of problem provide solutions for all 
problems of the latter types, and that admissibility for a problem of the first 
type implies admissibility for problems of the latter types. Two examples are 
given: Estimation of the mean of a random variable which is (1) normal with 
known variance, (2) rectangular with known range. The resulting minimax 
estimates are, with a small class of exceptions, proved admissible among the 
class of all procedures with continuous risk functions. The two loss functions 
are in each case the number of observations, and an arbitrary nondecreasing func- 
tion of the absolute error of estimate. Extensions to a function of the number 
of observations for the first loss function are indicated, and two examples are 
given for the normal case where the sample size can or must be randomised 
among more than a consecutive pair of integers. 

1. Introduction. We will consider a sequence X,, X:, X3, --- of independent 
random variables, each having the same distribution F, which belongs to a class 
2 of possible distributions. A sequential decision procedure S is a rule by which, 
having observed 2, --- , Xm(m = O, 1, 2, ---) we make one of the following 
decisions: 

(a) Take an observation on Xm41. 

(b) Stop experimentation and make a terminal decision d = d(x, +--+ , Lm). 
We will consider two non-negative loss functions Wi(n, d, F) and W,(n, d, F). 
Each can be thought of as an economic loss incurred when the X’s have dis- 
tribution F and the terminal decision d is made after n observations have been 
taken. In the simplest applications one W will be a function of n only (cost of 
experimentation) and the other W will be a function of d and F only (loss in- 
curred by making the decision d when the X’s have distribution F). We will 
denote by E(W; | F, S) the expected value of W; when the X’s have distribution 
F and the decision procedure S is used. Let — be any probability measure defined 
on some class of subsets of 2. We will denote by E(W, | £, S) the expected value 


1 Work done under the sponsorship of the Office of Naval Research, and submitted in 
partial satisfaction of the requirements for the degree of Ph.D. at the University of Cali- 
fornia, Berkeley. 


22 





MINIMAX PROCEDURES 23 


of W;, given the (a priori) distribution — over 2, when the decision procedure 
S is used. 

Minimax problems, first considered by Wald, have been formulated in three 
ways for the situation just described. We may seek a decision procedure S which 
will (i) subject to an upper bound on E(W, | F, S), minimise sup E(W; | F, 8); 


or (ii) subject to an upper bound on E(W,| F, S), minimise sup E(W, | F, 8); 
Q 


or (iii) minimise sup {c E(W,| F, S) + E(W,|F, S)}, where0 < ec < @,€ 
Q 


constant. We will show that in certain cases it suffices to find solutions for all 
problems (iii) since these solutions provide solutions for all problems (i) and (ii). 

We will first discuss the corresponding Bayes problems, not for their own 
interest, but because they can often be used to find solutions for the minimax 
problems in which we are really interested. 

2. Bayes problems. The following three classes of Bayes problems have been 
considered: Given a priori the distribution £ over 2, we want to find a (Bayes) 
procedure which will 


(i)’ subject to E(W, | é, S) s hi, minimise E(W; | &, 8S), 
(ii)’ subject to E(W:| & S) S Le, minimise E(W, | &, S), 

(iii)’ minimise r,(¢, S) = cH(W, | & S) + E(We | &, 8). 

Let §&, be the class of all solutions of problem (iii)’ for a given c,0 <¢ < @. 


Let $= U_ §, be the class of all solutions of problems (iii)’,0 <¢ < ©. 
0<e<a 


Lemma 1. Jf S’ « & has E(W, | &, S’) = Ly, then S’ is a solution of the problem 
(i)’ for this L, . If 8” is any other solution of this problem (i)’, then E(W, | &, 8”) = 
L, and S" ¢ 8. Similarly for problems (ii)’. 

Proor. Let S’ ¢ 8. Suppose there exists a procedure S* having 


E(W, | &, S*) s E(W, | é, 8S’) = Ly, 
E(W; | &, S*) < E(W2| &, 8’). 
Then 
cE(W, | &, S*) + E(W2 | &, S*) < cE(W, | &, 8S’) + E(We| & 8). 


This implies S’ ¢ &,, which is false. This contradiction shows that no such S* 
can exist. Hence S’ is a solution of this problem (i)’. 

If S” is any other solution of this problem (i)’ we must have E(W, | §, S”) = 
E(W, | &, S’). Suppose that H(W, | &, 8S”) < E(W,|é, S’) = L,. Then 


cE(W, | &, S”) + E(W2| &, 8”) < cE(W, | &, 8S’) + E(W2| &, 8, 


implying the contradiction S’ ¢ 8. Hence E(W, | &, S”) = E(W,|&, S)) = 1. 
We therefore have r.(é, S”) = r.(t, S’), and so S” € §,.. 





24 COLIN R. BLYTH 


Lemma 2. If S’e« 8S, S” € 8, then 
E(W, | &, S’) < E(W, | &, S”) ——> E(W; | &, 8’) > E(W, | &, 8”), 
E(W, | &, 8S’) = E(W, | &, 8”) ——> E(W, | &, 8’) = E(W2 | &, 8”). 
Lemma 3. Jf S’ ¢€ 8,-, and S” € S++ where c’ < c”, then 
E(W, | &, S’) 2 E(W, | &, 8”), 
E(W, | &, 8) S E(W; | &, 8”). 
Proor. Assume one of the following: 
I, = E(W, | &S’) < ECW, | &, 8”) = L, +7, 
L, = E(W;2 | &,S’) > E(W2|&8”) =1Ii—s. 


The other then follows from Lemma 2. Write c” = c’ + a. Here r > 0,8 > 0, 
a > 0. Then 


re(é, S’) = cL, + La, 
re(t, 8”) = ’L, + La + (cr — 8), 
rer(é, S’) = CL, + 12+ aly, 
rev(E, 8S”) = cL, + Le + al, + (c’r — 8 + ar). 
S’eS:—7>cr—s2 0, 
and S” e827 cr—star sz 0. 
Since ar > 0 these last two results cannot both be true. This contradiction shows 


that neither of the assumed inequalities can be true, and proves the lemma. 
Let us write 


= inf E(W,|¢,S), L, = sup E(W,|é,S), 
5S SeS 


= inf L, = sup E(W2 | &, S), 
re E(W, | é, S), 2 saa 2 | gE ) 


where the improper value © is admitted for the upper bounds. 
Lemma 4. 


E(W, | &, 8S) < L, > E(W;2 | &, 8) oe, 
E(W;2 | &, S) < Le E(W; | &, 8) oo, 


Proor. Suppose that S is a procedure for which E(W,| & S) = LZ, < ZT, and 
E(W; | g, S) = Ls < @, 
If L, = o, there exists some S, «€ §&, having E(W,|t S.) 2 Zn and 


E(W2|&, S.) > I, ; but we would then have r.(g, S.) > r-(&, S), contradict- 
ing the fact that S.e S,. 





MINIMAX PROCEDURES 
If L, < #, then for S.« Swe have 


cE(W, | §, S.) + E(WW2|§, 8) 2ei+h>cl+Ir 


for c sufficiently large, again contradicting the fact that S,¢ 8, . This completes 
proof of the first part of the lemma; the second part is proved in the same way. 

Lemma 4 shows that no problem (i)’ with Z,; < J, has a solution. Lemmas 2 
and 4 show that if Se $ has E(W,|¢, S) = [,, then E(W,|¢, S) = L, and S 
is a solution of all problems (i)’ with L, 2 TZ, . Similar remarks hold for problems 
(ii)’. 

Tueorem. If for every L; satisfying L, S I, S Ty, there exists Se 8 having 
E(W,|&, S) = Ly, then the class of all solutions of problems (i)’ with, Ss lh 
< L, coincideswith §. Similarly for problems (ii)’. If L, = © or L, = & the appro- 
priate equality signs must be omitted. 

This theorem is an immediate consequence of Lemma 1. 

Nore. From monotonicity (Lemma 3) we know that as c — c’ from one side 
and S,e & , E(W,|£&, S.) —> some limit Z, from one side and E(W; |, S.) 
some limit L, from one side. If this implies the existence of a procedure S having 
E(W, | &, S) = L, and E(W, | ¢, S) = L, whenever J, and I; are finite, it is easy 
to show that Se 8.0 , and that the conditions for the theorem are satisfied. How- 
ever, the conditions themselves are usually easy to check once we have found 8. 

Suppose that for a given 2, §, W; , W2 we have found the class $ of all solu- 
tions of problems (iii)’,0 < ¢ < ©, and find the conditions for the above theorem 
satisfied. Solving any problem (i)’ or (ii)’ is now reduced to choosing the appro- 
priate member of 8. 

3. Minimax problems. The following three classes of minimax problems have 
been considered: We want to find a (minimax) procedure which will 


(i) subject to sup E(W, | F, S) S Ly, minimise sup E(W; | F, S), 
Q Q 


(ii) subject to sup E(W,| F, S) s La, minimise sup E(W, | F, S), 
2 0 


(iii) minimise sup {cE(W, | F, S) + E(W;| F, S)}. 
Q 


If there is an a priori distribution which is least favorable in problem (iii)’ 
for all c, 0 < ¢ < o, this distribution is also least favorable for all problems 
(i)’ and (ii)’. The Bayes solutions with respect to this distribution are minimax 
solutions of the corresponding problems stated in this section. In many problems, 
however, this easy approach is not available. 

Lemma 5. Suppose some problem (iii) has a solution S’ with 


sup E(W, | F,S’) = 1,, sup E(W2| F, 8’) = La, 
Q 


sup {cE(W, | F, S’) + E(w; | P, S’)} = cL, + L:. 
Q 





26 COLIN R. BLYTH 


(These conditions will in particular hold if either sup E(W,| F, S’) = I, and 
Q 
E(W, | F, 8S’) = Le, or sup E(W; | F, S’) = L, and E(W, | F, S’) = L,.) Then 
Q a a 
S’ is a solution of the problem (i) with this L, , and a solution of the problem (ii) 


with this Le. 
Proor. Suppose there is a procedure S having 


sup E(W, | F, 8S) Ss Li, sup E(W;| F,S) < Le. 
Q Q 


Then we would have 


sup {cE(W, | F, S) + E(W2| F, S)} s csup E(W, | F, S) 
Q a 


+ sup E(W,| F, S) < cl, + L2 = sup {cE(W, | F, S’) + E(W2| F, 8’)}, 
2 a 


contradicting the fact that S’ is a solution of some problem (iii). Hence no such 
S can exist, and S’ is a solution of the problem (i) with this Z, . Similarly S’ is 
a solution of the problem (ii) with this LZ, . 

Let @ be any class of solutions of problems (iii), each member S of which 
satisfies the condition 


sup {E(W;|F, S) + E(Ws|F,S)} = sup E(W,|F, S) + sup E(W2|F, 8). 
Q Q a 


Let @. denote those members of @ which are solutions of the problem (iii) for 
this particular c. Then the following two lemmas can be proved in exactly the 
same way as the corresponding lemmas of Section 2. 

Lema 2a. If S’ e @, S” € C, then 


sup E(W,| F, S’) < sup E(W, | F, S”) ——> sup E(W,| F, S’) > sup E(W; | F, 8”), 
2 Q Q Q 


and 


sup E(W,| F, S’) = sup E(W, | F, S”) —— sup E(W,| F, S’) = sup E(W,| F, S”). 
Q Q 2 Q 


Lemma 3a. If S’ « ©, and S” ¢ @,-,, where c’ < c’’, then 


sup E(W,|F, S’) = sup E(W;| F, 8”) 
Q Q 


sup E(W;| F, S’) S sup E(W;| F, 8”). 
a a 
Suppose that we have found such a class @ of solutions of problems (iii) 


and that there exists Se @ having sup E(W;|F, S) = L,; whenever inf 
Q n,d,F 


Win, ad, F) SL; S sup W,(n, d, F), i = 1, 2. (Omit appropriate equality 
n,d,F 

signs if either upper bound is ©). Then solving any problem (i) or (ii) is reduced 

to choosing the appropriate member of €. 





MINIMAX PROCEDURES 27 


In order to find solutions of problems (iii) in the examples we consider, the 
following lemma, which is due to E. Lehmann, will be needed. 

Lemma 6. Consider the minimax problem of finding a procedure which minimises 
sup r(F, S). (This may be subject to conditions as in (i) and (ii), er not as in (iii).) 


Let S, be a Bayes procedure with respect to the a priori distribution & over Q, 
k = 1,2,---. Then for any procedure S, 


sup r(F,S) 2 r(&, S) = r(&, Si) 
for all k. Therefore 


sup r(F, S) = lim sup r(é,, S,). 
2 k—0 


A sufficient condition for the procedure So to be minimax is therefore 


r(F, So) s lim sup r(&, S,) 
ka 


for all F eQ. 

4. Admissibility. Admissible procedures (not necessarily solutions) for the 
problems stated in Section 3 are defined as follows: 

A procedure S is admissible for a particular problem (iii) if there is no pro- 
cedure S* having 


r-(F, S*) = r(F, S) for all F € Q, 


with strict inequality for at least one F eQ, where r.(F, S) = cE(W,| F, S) + 
E(W; | F, 8). , 

A procedure S is admissible for a particular problem (i) if there is no pro- 
cedure S* having 


sup E(W,| F, S*) s Li, 
Q 


E(W,| F, S*) Ss E(W2|F,S) forall FeQ, 


with strict inequality for at least one F ¢ 2. Admissibility is defined in a similar 
way for problem (ii). 
Lemma 7. Suppose S is an admissible procedure for some problem (iii). Then 
if E(W,| F, 8S) = Ly, S is admissible for the problem (i) with this L, . And if 
Q 
E(W,| F, S) = Lx, S is admissible for the problem (ii) with this L, . 
a 


Proor. Suppose that E(W,| F, S) - LI, and that S is not admissible 


for the problem (i) with this Z,. Then there is a procedure S* having sup 
. Q 





28 COLIN R. BLYTH 


E(W, | F, S*) s I, ; and E(W, | F, S*) s E(W,| F, S) for all F €Q, with strict 
inequality for at least one F «2. We therefore have 


r(F, S*) = cE(W;| F, S*) + E(W;| F, S*) 
cL, + E(W,|F, S) = cE(W,|F, S) + E(W2|F, 8S) = r.(F, S) 


for all F ¢Q, with strict inequality for at least one F eQ. That is, S cannot be 
admissible for any problem (iii), a contradiction which proves the first part of 
the lemma. The second part is proved in the same way. 

If for a problem there is a least favorable distribution for which the Bayes 
solution is unique, this is the unique minimax solution and is therefore admissible. 
If 2 is a parametric family and all possible procedures have risks continuous in 
the parameter @, and d is a least favorable distribution which assigns positive 
probability to every interval of values of 6, then any Bayes solution for this 
is minimax and admissible. When can we conclude that minimax solutions ob- 
tained by the method of Lemma 6 are admissible? In Sections 5 and 7 we will 
show for particular examples that the solutions so obtained, except for trivial 
exceptions, are all admissible among the class of procedures with continuous 
risk functions. We might hope that all constant risk minimax solutions so ob- 
tained are admissible, but will see that this is not so. 

The method used here for proving admissibility of minimax solutions involves 
examination of the Bayes solutions used to obtain them. In the examples con- 
sidered, if W: is continuous, this method works both for classical fixed sample 
size problems and for the sequential problems (i), (ii), (iii) subject to the addi- 
tional restriction that the number of observations is bounded. 

Admissibility is proved for a number of examples by Hodges and Lehmann 
in [4] by a completely different method, which involves no appeal to Bayes solu- 
tions, and which works for certain fixed sample size problems in which the method 
of this paper fails. Their method, however, is restricted to number of observa- 
tions and squared error of estimate for loss functions, and among sequential 
problems will handle only (i), again subject to the additional restriction that 
the number of observations is bounded. 

5. Example: normal. Let X,, X:, --- be a sequence of independent random 
variables, each being N(@, 1), i.e., normal with mean @ and variance 1. A point 
estimate z is wanted for the mean @. Let 


Wi(n, z, 0) =n, W.(n, z, 0) = W(z — 86), 


where W is a non-decreasing function of | z — @|. The three classes of minimax 
problems are 


(i) subject to sup E,(n) s M, minimise sup E,W(z — 8), 
6 @ 


(ii) subject to sup E,W(z — 6) s L, minimise sup E,(n), 
6 6 


(iii) minimise sup {cE,(n) + E,W(z — @)}. 
6 





MINIMAX PROCEDURES 29 


Nore. This problem was first considered by Stein and Wald in [1]. They solved 
problems (i) and (ii) for the case W(z — @) = 0 or 1 according as| z — @| S a/2 
or >a/2; their estimates are thus confidence intervals of fixed length a. For 
this same case Wolfowitz in [2] solved problems (iii) and showed that these solu- 
tions provide solutions for problems (i) and (ii). Wolfowitz also obtained solu- 
tions of problems (iii) for the general W(z — @), non-decreasing in |z — @|. 
The question of admissibility is not considered in [1] or [2]. 

The remainder of this section will be concerned with proving the following 
results. 

THEOREM. To a given c there corresponds either an integer N or a pair of con- 
secutive integers N, N + 1. A class of solutions of the problem (iii) for this c are 
procedures in which the _ possible sample sizes are N (or N, N + 1) and in 
which the estimate used is z Ls X,ifn> 0. If N # 0, all such solutions are ad- 


t=1 
missible among the class of procedures with continuous risk functions. The class 
of solutions so obtained,O < c < ©, provides solutions for all problems (i) and (ii). 
We will find solutions for problems (iii) by first finding Bayes solutions for the 
corresponding problems (iii)’ when @ has the a priori distribution N(0, o*). The 
Bayes problem is to find a sequential estimation procedure which will minimise 
the risk 


Tan f {cE,(n) + EyW(z — 0)}6° 9 ao. 


We will assume that W(z — 6) increases with | z — @| slowly enough so that 
[Bowe + era? dp < « 


for some oo , wo , 20, and hence for all ¢ < ao, yu, z. 

Let us first determine what should be our estimate z for @ if we stop after 
having observed x, ,---, 2». For this we need to know the a@ posteriori dis- 
tribution 


p(@|x, 79%, Im) ~ pO, x1, +++, tm)/p(x, ++, Lm) 


—(1/202)@2 —hy (2g—8)2 
= ¢(1, ***, Ine "Neely 


ound 2 2) (6—(62 2 me,)2 
ex(21, eee, Ime ((me3+1) /202) (6—(62 | (ma2+1)) 2724) ; 


° ° . o o ° 
That 1s, 6, given %,°°*, Im, 18 N — x vy art): Given that we 


observe 2; , *** , Z» and thenstop and estimate z(z; , --- , Xm) for 6, our (a posteri- 
ort) risk is therefore 


om + eect [ Wz Ma Ge met 1 1809) O—Co8 (me 1) ET 9G)? do. 
one 





30 COLIN R. BLYTH 


Since W(z — @) is a non-decreasing function of |z — @|, this risk is clearly 
o 
minimised by choosing z = mst > x;, where we interpret > x, = 0 if 
> 
= 0. The minimum value is 


2 oo 
reehon) = om YOEH I [waren dy. 
V/ 24o 20 

This does not depend on the observations, but only on the number of observa- 
tions. Since r.. — © as m — © it is clear that the sequence r.,.(m):m = 0, 
1, 2, --+ assumes a minimum value at a finite set n},---, >[p’ = p'(c, o)] 
of icici m. Hence if @ is N(0, 0”) a priori, any of the following sigan is 
Bayes: The only as sample sizes are nj, °° » a ; if the sample size is m, 
the estimate z = ere ps x; is used for 6. 


To obtain sili snianiesiin consider a sequence of o’s tending to ». As 


G— @, 
Te,e(m) —r(m) = cm + /% [ Wye dy 
25 Jw 


form = 1, 2,--- , and r.,.(0) + 7r.(0) = sup W(y). 
ue 


Clearly r.(m):m = 0, 1, 2, --- assumes a minimum value at a finite set 
m,°**,N[p = p(c)] of integers m. 

Consider the following class @; of sequential procedures: The only possible 
sample sizes aren , --- ,n, . If the sample size is 0, estimate 0 for 6 (: any est imate 


, ] 
whatever will do as well). If the sample size is m > 0 estimate z = — x; for 6. 


m 
Writing n; < nz. < -++ < m,, the risk of any such procedure, if n; = 0, is 


r3(0) = P(n = 0)W(6) + % {Pain = n;) on + E,W br > ti s) | 


P(n = 0)W(6) + > | Paln = ni) E + ff = (ye iy | 


t=2 


P(n = 0) sup W(y) + > P,(n = nj)r-(ni) 
v i=? 
sup W(y) = r.(nj), p= 2,---,p, forall @. 
a 
Similarly, if n; # 0, it is easy to show that 


re(0) = re(ni), i= 1,+++, p, for all 6. 


It follows at once from Lemma 6 that every member of @; is a minimax pro- 
cedure for the problem (iii) with this c. 





MINIMAX PROCEDURES 
We will next show that 
r(m) = cm + = [ Wye" dy for m> 0, 
= sup W(y) for m=0 
is a convex function of m. Let m» be the smallest integer for which r.(m) < ©; 


this is the same for all c. Then r,(m) is continuous in m for all m = mo . Convexity 
of r.(m) is equivalent to convexity of 


sbalinatal [ Wye ay, 


It is easy to show that for m S m < o, differentiation under the integral sign 
any number of times is valid for g(m). Therefore 


g'(m) = Ta [ Wye "(1 — my’) dy, 


1 eo 
g''(m) = mval Wye" (my! — 2my* — 1) dy 


= mf W (Sz) e**(c* — 22* — 1) dz. 


za — 227 -—1<0 for Os2t<V14+ wv, 
a —27°-1>0 for Vi+ Vv <z. 


Also, W(y) is non-decreasing as y > 0 increases and we will exclude from con- 
sideration the trivial case W(y) = constant. It follows that 


Vit+v2 3 
g’(m) > +, W (G/itv2 ) (at — 22" — 1) de 
4m m 
2° / ee 
+ a a W (4/1+¥3) e "(2 — 27° — 1) dx 
m 


eee w (4/1 +3) [ e**(2* — 227 — 1) dz = 0. 
4m? m 


That is, g(m) is strictly convex for all m = m, . Hence r.(m) is continuous and 
strictly convex for m = m. 

For any given c, it follows that r.(m):m. = 0, 1, 2, --+ is smallest for at most 
two consecutive integers m. If at two consecutive integers, these must be on 
opposite sides of the m which minimises r.(m). (Thus p = 1 or 2. The same re- 
sults are now obvious for any r.,.(m), given c¢, a.) 

For all c sufficiently large, r.(m):m = 0, 1, 2, -++ is minimised by m = mo 





32 COLIN R. BLYTH 


only. Now, for any given m, r.(m) and Or,(m)/dm and r.(m + 1) — r.(m) are 
continuous and strictly increasing functions of c,0 < c < #. Therefore as we 
decrease c continuously from such a sufficiently large value, r.(m):m = 0, 1, 
2, -++ remains smallest for m = mp only, until a point c’ is reached for which 
r.i(m):m = 0, 1, 2, --- is minimised by m = m and m = mp + 1. As we con- 
tinue to decrease c, for ¢ sufficiently small and c’ — e < ¢ < c’, r.(m):m = 0, 1, 
2, --- is clearly smallest for m = m», + 1 only. This remains true until we reach 
a point c’ for which r.2(m):m = 0, 1, 2, --- is minimised by m = mp + 1 and 
m = m + 2. As we continue to decrease c, r-.(m):m = 0, 1, 2, --- is smallest 
for larger and larger m’s, which tend to ~ as c — 0, because, for a given m, 
dr.(m)/dm < 0 for all c sufficiently small. We note that only for a denumerable 
set of c’s is r.(m):m = 0, 1, 2, --- minimised by two consecutive m’s; for almost 
all c’s this minimum occurs for only one m. 

Let ©, consist of those members of ©; in which the sample size does not de- 
pend on @. Included are the procedures in which the sample size is randomised, 
independently of the observations, between the possible sample sizes. Let 


e= U_ e,. Now &,(n) is constant for any member of @, implying sup 
0<e<e@ 6 


{E,(n) + E,(W)} = sup E,(n) + sup Ey(W). Lemmas 5, 2a and 3a are there- 
6 6 
fore valid for @. 
For every M,m < M < = there is clearly a member of € having E,(n) M. 
Using continuity considerations it is easy to show that for every L,0O < L < &, 
there is a member of @ having sup E,(W) = L. It follows from Lemma 5 that e 


contains a solution for every jucialii (i) with M = mp» (problems (i) with M < 
m, have no solutions) and a solution for every problem (ii). Selection of the 
appropriate member of @ is obvious for any particular problem (i), requires 
successive approximation for any particular problem (ii). 


Are the members of @’ = U_ @ admissible for the problems (iii) for 
0<ce<wo 


which they are solutions? We will answer this question first for those members 
of @’ for which 0 is not a possible sample size. 

For a given c, suppose that r.(m):m = 0, 1, 2,--- is minimised by m = 
N = 0 only, or by m = N # O and m = N + 1 only. Since, for every m, 
Tc,e(m) + r-(m) as o > ©, it is clear that if @ has the distribution A, = N(0, o’) 
a priori with o sufficiently large, say ¢ > K,, then N and N + 1 are the only 
possible sample sizes for a Bayes solution. We observe further that 


1 i} 
pe an V/ 26 . N + x, L Wye rere? dy, 
reoN +1) = c(N +1) a == mV N +3 ~+1 [ Wye Att t0 iv? ay, 


If r.(N) = r(N + 1), as we are assuming, it follows from the convexity of 


g(m) = -/m r Wy) en nlte? dy that r..6(N) < re(N +1). Hence N is the only 





MINIMAX PROCEDURES 33 


possible sample size for a Bayes procedure, ¢ > K, . Therefore, for this given 
c the (minimax) risk function for every member of @} is 


r(0) =r=cN+ az VN [ W(y)e 2" dy, 


and the Bayes risk for a priori \,, 0 > K;, is 


: ss 
te = cN + \/ on y/N + al Wye tile) av? dy. 


If the procedures in @{ are non-admissible for this problem (iii) there must 
exist a procedure S* having risk function r*(@) < r for all 6, with strict inequality 
for at least one @. Assuming r*(6) continuous this implies strict inequality for 
some interval of values of 6. We can therefore find two constants a and k, 0 < 
a <rand0 < k < o such that 


1 k 
— * = 
ok [or (6) dé 


Also, given any fixed e, 0 < ¢ < 1 — a/r, we can find K > K;, so large that 
for —-k 5 6S k, 


l-—e<e 
Then for all ¢ > K we have 


[ r*(0)\-(0) d@ = 7; a f r*(g)e mene do 


ou 2)92 . 
(ae) <1 whenever o > K. 


wk 


—(1/202) 62 if a —(1/208)62 
= / 2x0 [roe dé +r / 2 -[e dé 
1 k 
sox, [O10 +r - Fen [ (1 — 2) a 


—(1/202) 63 ™ —(1/202) 62 
=f. r*(0)e db + [ dé 


1 2r 
Vane tha + r ~ Vigo BA — €) 
b 


=fr— — 
g 

2k(r — a — re) ‘ 

where b = \/ 2m > 0 is a constant. 


Now the Bayes risk for \,, ¢ > K, is 


.’=r— Fawr [ Wye *™ a 


_ y/N+ a Wye orerin® ay}. 





34 COLIN R. BLYTH 


We have seen that for m = N, the function g(m) = vn | Wy) em" dy 
0 


has continuous derivatives g’(m) < 0 and g’’(m) > 0. It follows that 
Se ween S 
=> “ a N a 
ree % Vin 9 N) 


g'(N) being a negative constant. It is clear that for o sufficiently large, 


a b 
Ye r+ 7,9) 5>F8 = 


eo) 
> [ ror.) ae. 
But this contradicts the fact that r, is the Bayes risk for A, , and so no such S* 
can exist. Therefore, if 0 is not a possible sample size for members of @; , every 
member of @; is admissible among the class of procedures with continuous risk 
functions, for the problem (iii) with this c. 

Furthermore, £,(n) and Ey(W) are both constants for members of @ which 
belong to such a @,. It follows from Lemma 7 that such members of @ are ad- 
missible among the class of procedures with continuous risk functions, for the 
problems (i) and (ii) for which they are minimax. 

If W is continuous and the number of observations is bounded, it can be 
shown that r*(@) is continuous. Thus if W is continuous, we have proved ad- 
missibility among the class of procedures with n bounded. 

There remains the question of admissibility for those @; in which the possible 
sample sizes are 0 and 1, or 0 only. If 0 ane 1 are both possible sample sizes, two 
members of ©; are A: take 0 observations and estimate 0 for 6; and B: take 1 
observation and estimate x, for @. Procedure A has risk function r(@| A) = 


1 “ 
W(6). Procedure B has risk function r(9| B) = ¢ + >= [ Wy)e™ dy = 
V 24 eo, 
sup W(y). It easily follows that, except for A, all members of @, are non-admis- 
7 


sible. The procedure A is admissible. For let S be any procedure which assigns 
probability a > 0 to sample sizes > 0. Then we have 


r(0| S) = ac + WO) > WO) = r(0| A), 


so that no such S could make A non-admissible. Let T be any procedure which 
assigns probability 1 to the sample size 0. For any such procedure the risk r(@ | 7’) 
is an average, for some distribution of z, of W(z — 6). Let (—@o , ) be the in- 
terval or point on which W(@) = W(0). Clearly we cannot have r(@ | T) = W(O) 
for all 6 € (— 4 , 6) unless 7 coincides with A with probability 1. Hence no such 
T could make A non-admissible, and it now follows that A is admissible. This 
proof also shows that A is admissible when 0 is the only possible sample size 
for members of @.. 





MINIMAX PROCEDURES 35 


Similar arguments show that every member of @ which belongs to a C, of 
the above types, is admissible for the problems (i) and (ii) for which it is minimax. 

6. Extensions of normal example. An outline of the solution of the example 
of section 5 shows that the same method can be used for other examples. Let X; , 
X:, +--+ be independent random variables, each having the same density p(x) 
wiih respect to a fixed measure yu. A point estimate z is wanted for the real pa- 
rameter 6. Let 


W,(n, z, 0) = Wi(n), W2(n, z, 6) = We(z, 6) 


and define the three classes of minimax problems as usual. 

Suppose that we can find a sequence £, , & , --- of a priori distributions and 
a double sequence 2x,0, 2%,1(%1), 2%,2(%1, 22), °-: ; k = 1, 2, +++ of estimates of 
6, such that if @ has a priori distribution & and we observe 7 , --- , 2» and then 
stop, the a posteriori risk is minimised by estimating z:,» for 6, and the minimum 
value is 


Tex(m) = CW,(m) + [ Welzim, 9)p(O| x1, +++ , tm 3 Ex) dB, 


depending not on the observations but only on the number m of observations 
(and c, k). Clearly the same sequences will do for all c,0 < ¢ < ©, and for all 
functions W,(n). 

Then the following procedures are Bayes for the problem (jiii)’ with this c, 
aud with @ having a priori distribution & : The only possible sample sizes are 
those which minimise r,,,(m):m = 0, 1, 2, --- ; if the sample size is m estimate 
Zk.m for 6. ‘ 

Suppose for a particular £, and for some particular c, that these possible sample 
sizes are nm; < mz < --- . Since r,..(m) is continuous in c for any k, m it is clear 
that for ¢ sufficiently small and c < c’ < c + ¢€, no value of m other than n,, 
nz, °° could minimise r.-,(m): m = 0, 1, 2,---. And a minimum for any 
m > mn, would provide a contradiction of Lemma 3. Hence for ¢c < ce’ < ¢ + ¢, 
re4(m):m = 0, 1, 2, --- is minimised by m = n, only. It follows that randomisa- 
tions in sample size for Bayes solutions are possible only for a denumerable set 
of c’s; for almost all c, only one fixed sample size is possible. 

Suppose that as k —> © every sequence 2:,m, Z2,m,°°* tends to a limit z,, 
and that r..(m) — r.(m) = cW,(m) + L(m), form = 0, 1, 2, --- . If the pro- 
cedure: take a sample of fixed size m and estimate z,, for 6 has risk function r¢ (0) 
= cW,(m) + Le(m) S r.(m) for all 0, the following procedures are minimax 
for the problem (iii) with this c: The only possible sample sizes are those which 
minimise r,.(m):m = 0, 1, 2, --- . If the sample size is m estimate z,, for 0. 

Extension to problems (i) and (ii) can now be carried out as in section 5. We 
note that a problem of this type when solved for one W;(m) can be solved for 
any other W(m) by merely determining the proper sample sizes. If r.(m) is a 
convex function of m, the possible sample sizes are always one integer or two 





36 COLIN R. BLYTH 


consecutive integers. But if r.(m) is not convex, practically any set of integers 
can be possible sizes, as indicated in the following examples. 

ExaAmMpPLe. Let X,, X:2,-°-- be independent random variables, each being 
N(@, 1). A point estimate z is wanted for the mean @. Let 


Wi(n) = 3 for n = 0, 1, 2, 3, 


n-—3 a 
=1+ 105 for n = 4,5, 


W.(z, 0) = (2 — 0)’. 


Thus the first three observations each cost 4, subsequent observations each 
cost y$s. Making the necessary substitutions in section 5, we get 


r-(m) =c ms a ks for m = 1, 2,3, 
3 m 
c(m — 3) + 


1 
ar m 


for m = 4,5, +++. 

For c = 1 it is easy to show that r;(m):m = 1, 2, --+ is minimised by m = 
and m = 10. Forc # 1, r.(m):m = 1, 2, --+ is minimised by one integer or by 
a pair of consecutive integers. Solutions are obtained for all problems (i), (ii), 
(iii) as in section 5. The solution obtained for any problem (i) with 3 = M Ss 
is the following: 


with probability so take Z observations, 


with probability —— take 10 observations, 


estimate 2 


ExampLe. Let X,, X:,---+ be independent random variables each being 
N(6, 1). A point estimate z is wanted for the mean @. Let 


W,(n) 1—- for n=1,2,--- 
n 


=0 for n=O, 
W2(z, 0) = (2 — 6)”. 


Making the necessary substitutions in Section 5, 


r(m) =c+ (1 — 0) for m = 1,2, -« 





MINIMAX PROCEDURES 37 


Clearly r;(m):m = 1, 2, --- is constant. Thus any procedure in which the sample 
size is at least 1 and the estimate z = : Zz. z; is used for 6, is minimax for the 
i=l 
problem (iii) with ¢ = 1. If ¢ < 1, problem (iii) has no solution. (The larger 
the sample size, the smaller is the risk.) If ec > 1, r-(m):m = 1, 2, --+ is min- 
imised by m = 1 only. (In both these examples the possibility n = 0 is excluded 
because sup (risk) is then ©.) 
6 


7. Example: rectangular. Let X,, X:,--- be a sequence of independent 
random variables, each being R(@ — 4, @ + 4), i.e., rectangular with range @ — 3 
to 6+ 4. A point estimate z is wanted for the parameter @. Let 


Wi(n, z, 0) = n, W.(n, 2,0) = W(z — 6), 


where W is a non-decreasing function of | z — @|. The three classes of minimax 
problems are 


(i) subject to sup E,(n) s M, minimise sup E,W(z — 8), 
6 6 

(ii) subject to sup E,W(z — 6) gs L, minimise sup E,(n), 
6 6 

(iii) minimise sup {cE,(n) + EsW(z — @)}. 
4 


Norte. The problems (iii) are solved by Wald in [3] for the case W(z — @) = 
(z -- 6)”. We will show that Wald’s solution holds for any W(z — 6) which is 
non-decreasing in | z — @| , and will obtain solutions of (i) and (ii). In addition, 
admissibility results will be proved as in Section 5. 

The remainder of this section will be concerned with proving the following 
results. 

TueoreM. The following procedures are admissible solutions of problem (iii) 
among the cues of all procedures with continuous risk functions. If o¢* = sup 


W(a) — 2[ W(a) da — c < 0 take0 observations and estimate 0 for 6. If o* > 0 


take at least one observation and after the m* observation (m = 1, 2, --- ) compute 
the range rm of %1, +++, 2%m-1f tm > 1 — tf stop and estimate the aibeaiiie for 0; 
if tm < 1 — f take another observation; if rm = 1 — ido either. If ¢* = 0 follow 
either procedure. (Here i, to be defined later, is a constant depending on c and W.) 
The class of procedures so obtained, 0 < c < ©, provides admissible solutions 
among the class of procedures with continuous risk functions, for all problems (i) 
and (ii). 

Solutions are found for problems (iii) by first finding Bayes solutions for 
the corresponding problems (iii)’ when @ has a priori distribution R(a, b). The 
Bayes problem: is to find a sequential estimation procedure which minimises the 
risk 


b 
E{r(0|S)|@~ R(a, 0)} = ,~— | {cBa(n|S) + EXW|S)) a 


ELA ALORA ALL ALDEN CY FAO PAIL EM 





38 COLIN R. BLYTH 


Let us first determine what should be our estimate z if we stop after having 
observed 2, , --* , Xm. For this we will need to know the a posteriori distribu- 
tion p(@|2,--:, tm). Writing umn = min (%1,°-+, 2m) and vy, = 
max (11, °*:, 2m), this distribution is easily found to be R(u,, v~), where 
(ux, , vn) = (a, b) n (um — $, Um + 4) for m = 1, 2, --- , and (uo, v9) = (a, bd). 
Clearly a best estimate, i.e., one minimising the a posteriori risk cm + 


[We = Op Ola, ++, 20) dd is 2 = BT 
The minimum value is 


1 wf , rs 
tm = cem+ 7 if, w(o- +") ap 
On —~ tn de 2 


, the mid-point of (uy, vm). 


9 3 tm, 
cm + [ W (a) da, 
m “0 


where tn = Un — Um for m = 0, 1, 2, --- 
To determine an optimum stopping rule we will need to know, for all ¢ > 0, 
the conditional expected value of rm4: given t, = t. Now 


lm + Um +5) 


t 
a: oan 2 


P(tm+1| fm = t) = * (Iength of (at - 


-_ 


NM (Smit — 4, epi + »)). 


From this it is easy to show that 


E(tmsi | tm = t) = c(m+ 1) + 


fort = 1; and that for? < 1, 


‘ = t/2 
E(tm+1 | tm = t) = c(m +1) + a2 I W (a) da 


t 
- - [ | , " W (a) da | dz. 


> pti 
g(t) = cm +; [ W(a) da — E(rmsi | tm = t), 
0 


the expected decrease in a posteriori risk due to taking m + 1 instead of m 
observations when ¢,, = t. We have 


— 2 1/2 apf pe 
6(t) == [ W(a) da + (; _ 2) [ W(a) da — - | | [ W (a) da | dx —c 
t +o t 0 t + 0 





MINIMAX PROCEDURES 
for ¢ => 1; and fort s 1, 
t/2 i 4 t /2 
¢(t) -2/ Wa) da —* [ lf W(a) da | ar — 


Now W(a), being non-decreasing for a 2 0, has at most a denumerable set 
of discontinuities. If W(a) is continuous at a = t/2 we have, for ¢ > 1: 


: Loe g f* 2 pa 
4 1 /2 
+E LLL we aa ac 
i/2 ‘ 2 t/2 
LCL W(e) da} ae — 3 f W(a) da 


2 1 1/2 4 1 /2 . 
ff Wa) da] ae + 4% [| W(a) da | do 


wipes ir pan 
[ WCe) da -; I. Wa) da| dx 
5 LW (5) “ i W (4) +4 
W (4) = 0; 
and if t 


¢'(t) = 


If ¢/2 is a discontinuity point of W(a), the same inequalities hold for the one- 
sided derivatives of ¢(¢), both of which exist. We observe that these inequalities 
are strict unless W(a) is constant on the open interval (0, ¢/2). Noting that 
¢(t) + —c as t— 0, we have proved that ¢(¢) is continuous and non-decreasing 
for t > 0, being strictly increasing whenever ¢(t) > —c. 

Hence ¢(t) < 0 for all ¢, or else ¢(¢) = 0 has a unique root #. Using also the 
fact that ¢tn.4: S tm, We now obtain, by the methods of [3], the following results. 


a+b 


€ 


If ¢(t) < 1 for all ¢, a Bayes solution is: Take 0 observations, estimate 





40 COLIN R. BLYTH 


for 6. If ¢(t) > O for some ¢, a Bayes solution is: After the mth observation 
Un + Urn 
9 


- 


(m = 0,1,2--- ) compute ¢t, = Un — th. If tn < i stop and estimate 
for 0; if t, > t take another observation; if t,, = ¢ do either. 
Consider now the following procedures S): If ¢* = sup W(a) — 


9 


1/2 
3 [ W(a) da — c < 0 take 0 observations and estimate 0 for @. If ¢* > 0 take 
0 


at least one observation, and after each observation (m = 1, 2, --- ) compute 


> 9 
; ’ ; ; Um + Um 
t. = Um + } — (vm — 3) = Um — Um + 1; if & < i stop and estimate — ~ — 
for 0, if { > i take another observation, and if % = ¢ do either. Finally, if ¢* = 0 
use either of these two procedures. 


If ¢* > 0 it is easy to show that Eo(n | So), Eo(W | So) and 
r(@| So) = cEg(n | So) + Ey(W | So) =r 


are all constants. Also, for any particular c, there is always an S) for which 
E,(n | So) is constant. 

Let S, be a Bayes procedure when @ has the distribution & = R(—k, k) a 
priori. If ¢* < 0, then for all & the procedure S, is: take 0 observations and 
estimate 0 for 6; it thus coincides with an So . (Other possible Sy have the same 
sup r(@| So).) If ¢* > 0, then for all & sufficiently large the procedure S; coin- 

. 


cides with S, for —(k — 1) S 6 S k — 1. Taking a sequence of 
S,’'s with k — o, it easily follows from Lemma 6 that all procedures So are 
minimax for the problem (iii) in question. 

By the same methods as are used in section 5 it is easy to show that the 
procedures Sp obtained above provide solutions for all solvable problems (i) 
and (ii). 

In the case ¢* > 0, for the procedure Sp to be non-admissible for the problem 
(iii) for which it is minimax, there must exist a procedure So having risk function 


r(@|So) <r forall @ 


with strict inequality for at least one @ and so, if r(@| S$) is continuous, for 
an interval of values of 6. We can therefore find h > 4 such that 


1 h—1/2 . 
oh — 1 Bs r(6 | So) dé =a % rT 


Now for a = +2, +4, --- define the procedure S* as follows. If 2; , x2, °°: 
are observed, use the decision procedure So for the sequence 2, — ah, 
2 — ah, --- and add ah to the resulting estimate. We clearly have 


r(0| St) = r(@ — ah| So). 





MINIMAX PROCEDUKES 41 


Now define the procedure S* as follows. Take at least one observation. If 
m1 € (a— lh,a+ hl, a = 0, +2, +4,---, use the procedure S* 


If6e(a — th - 4,a + 1h — 4), theny e(a — 1 "le + a + Inland 0 the procedure 
S* reduces to S% . Hence r(@ | S*) coincides with r(@ | S*) for 


6¢€(a — Ih + $,a + lh — 3), a = 0, +2, +4,-:- 


And r(6@| S*) s r for all 6. Therefore 


(2k+1)h 

- (2h -—l)ha+r (r — a)(2h — 1) 
—_——_—_—_—__ ta * < ——_— = ns 
2(2k + lh fis. r(6 ; , ) e (ie 2h a 2h 


But if @ has the distribution R(— 2k + 1h, 2k + 1h) @ priori, the Bayes solution 
coincides with Sy for 6 « (— 2k + 1h + 1, 2k + 1h — 1). We therefore have 
for this a priori distribution 


9(2k a a 
Bayes risk = 2(2k + Ih — 2 r=r-<— r 


22k + Ih Qk + Ih’ 


For k sufficiently large this clearly exceeds r — (2h — 1)(r — a)/2h, contra- 
dicting the above inequality on the Bayes risk. It follows that no such S> as 
assumed can exist and therefore that the procedure So is admissible, among the 
class of procedures with continuous risk functions, for the problems (iii) for 
which it is minimax and also, by Lemma 7, for the problems (i) and (ii) for 
which it is minimax. 

If W is continuous and the number of observations is bounded it can be shown 
that r*(@) is continuous. Thus if W is continuous, So is admissible, among the 
class of procedures with n bounded, for the three problems. 

It remains to consider admissibility for procedures Sy where ¢* 3S 0. Proofs 
for these cases can be given in the same way as for the corresponding cases 
in Section 5. 

The solution for this example still works if we replace Wi(n) = n by some 
other W,(n), but only so long as the resulting function ¢(¢) is non-decreasing. 

Nore. In the above examples, a procedure is called cogredient if for every 
c the same number of observations is taken for x, + c, z2 + ¢,--- asfor 2, 
Ze,+*: and2(m, + ¢,-++,2n +c) = 2(m,-+-, 22) + c. Such procedures have 
constant risk functions; so it follows that all the constant risk procedures ob- 
tained in Sections 5, 6, 7 have uniformly minimum risk among all cogredient 
procedures for the problems for which they are minimax. 

The author wishes to express his appreciation to Professor E. L. Lehmann 
for valuable advice and assistance in the preparation of this paper, and also to 
Jack C. Kiefer, who read the manuscript and pointed out an important correc- 
tion. 





42 COLIN R. BLYTH 


REFERENCES 


{1] C. Sremn anp A. Wa xp, ‘Sequential confidence intervals for the mean of a normal 
distribution with known variance,” Annals of Math. Stat., Vol. i8 (1947), pp. 
427-433. 

[2] J. WotrowitTz, ‘Minimax estimates of the mean of a normal distribution with known 
variance,”’ Annals of Math. Stat., Vol. 21 (1950), pp. 218-230. 

[3] A. Wap, Statistical Decision Functions, John Wiley and Sons, 1950. 

[4] J. Hopces anp E. Lexumann, ‘Extensions and applications of the Cramér-Rao in- 
equality,’’ Proceedings of the Second Berkeley Symposium on Mathematical Sta- 
tistics and Probability, University of California Press, 1951. 





ON MINIMUM VARIANCE IN NONREGULAR ESTIMATION 


By R, C. Davis 
U.S. Naval Ordnance Test Station, China Lake, California 


1. Summary. A case of nonregular estimation arises in attempting to estimate 
a single unknown parameter, @, in the probability distribution of a single chance 
variable in which one or both of the extremities of the range of the distribution 
are functions of the unknown parameter. The case treated in this paper is the 
one in which a probability density of exponential type exists. When one ex- 
tremity alone of the range depends non-trivially upon @, a necessary and suf- 
ficient condition is given in order that a single order statistic be a sufficient 
statistic for @. In this case conditions are given for the existence of a unique 
unbiased estimate of @ possessing minimum variance uniformly in 6. In the case 
in which both extremities of the range depend upon 6, a necessary and suffi- 
cient condition is given that the smallest and largest order statistics constitute 
a set of sufficient statistics for 6. In this case Pitman [1] has shown that a single 
sufficient statistic exists if one extremity of the range is a monotone decreasing 
function of the other extremity.’ It is shown that under the above condition a 
unique unbiased estimate exists possessing minimum variance. Moreover a 
surmise of Pitman is proved that only under this condition does a single sufficient 
statistic exist. When a single sufficient statistic does not exist, an unbiased esti- 
mate of a known function of @ is obtained which has less variance than any 
analytic function of the set of sufficient statistics for @. 


2. Introduction. Let X be a chance variable assuming values z in a one- 
dimensional Euclidean space, R; , and let X possess a probability density func- 
tion f(x, 6) depending on a single unknown parameter @ which lies in 2, a sub- 
set of R, . Denote by a(@) and b(@) the lower and upper extremities of the range 
of f(z, 6). We treat the cases in which either one or both the extremities of the 
range depend nontrivially upon 6. For each @ e 2 denote by R*(6) the subset of 
R, satisfying a(@) S x S b(6), and by R**(@) the complement of R*(6) in R, . 
We make the following assumptions: 

ASSUMPTION A. 


f(z, 0) = 0 forall (zx, 6) on R**(6) XQ; 
f(x, 0) = PROS +7 for all (x, 6) on R*(0) X Q, 
where T(6) is a real single-valued continuous function of 6 at all points of 2, and 


S(x), K(x) are real single-valued continuous functions of x defined almost every- 
where on R, . 


AssuMPTION B. a(@) and b(6) are continuous functions of 0 satisfying for all 
6 € & the inequality a(@) <= b(@). 


1 The author is deeply indebted to the referee for bringing to his attention the paper by 
Pitman and for many other helpful suggestions. 


43 





44 R. C. DAVIS 


The exponential type of frequency function assumed above is the type which 
Koopman [2] has shown to hold whenever a sufficient statistic for 6 exists. We 
do not require any of his results, however, in this paper. 

For convenience in notation we write P(x) = e*” and Q(@) = e”, so that 
obviously we have the relation 


b(@) 
(Q@\" = [ Po de 


Furthermore if an estimate of @ is a continuous function of n independent 
sample values, is unbiased, and possesses minimum variance uniformly in 
6 « Q, we term this a best estimate of 6. 


3. One extremity of the range depending upon @. First we treat the case in 
which only one extremity of the range depends upon the unknown parameter 6. 
To fix the argument we assume that the upper extremity b(@) depends upon 6, 
and the lower extremity is independent of @. The results of this section are ex- 
tended in an obvious manner to the case in which the lower extremity alone 
depends upon @. 

THEOREM 1. Let 2, 22, °°: , Xn be the values of n independent drawings from 
a population having the probability density function f(x, 0) satisfying Assumptions 
A and B, and in which the upper extremity only of the range depends upon 6. The 
necessary and sufficient condition that the nth order statistic, denoted by 2») , be a 
sufficient statistic for @ is tha. 


f(z, 0) = P(x) Q(@) forall (2, 6) in R*(6) X Q. 


PROOF OF NECESSITY. Suppose that in a sample of n independent observations 
that the nth order statistic, x,,) , is a sufficient statistic for 6. It follows from 
the definition of sufficiency that 


f(t ’ 6) goiel S(Zn, 6) Pe 9 (Xn) ; 9) h(a) 9 °°* » Zn-1) | Zn) ; 9), 


where g(Z(n) , 9) is the frequency function of x(n), and h(xq) , +++ , Ln—1y | Tony 3 8) 
denotes the conditional frequency function of the order statistics Za) , «++ , Zn—1 , 
given a fixed value of 2;,) , and is independent of 6. It is well known from the 
theory of order statistics that g(x) ; 6) has the form 


Z(n) a n—1 
G(a¢my 38) = MF (xqm)]" fem) = nP(@m)[Q(9)]"* il P(n)e* an| , 


Z(n 


) 
f(n, 4) dn. 


It follows from the above that 


where F(z) = [ 


a 


[exp [ ¥ K(w) |] U P(z;) 


(1) h(xq),°**, Tiny | Tn; 4) = * oe ae 


Zin) oe a n—l 
n lf P(X” in| 





ON MINIMUM VARIANCE 45 
where h(zq) ,-** , X(n-1) | Z~m) 3 9) is independent of 6. Differentiating equation 
(1) partially with respect to 0, substituting the value of h(zq , «++ , Zm— | Zw 3 


6) from (1) and placing . = 0, we obtain after some simple algebra 


n—1 
Zin > K(z 7 )] Zin 
Qf KG@)P@)e™ dy = Ea [~ Pove®® de, 


n-—l 


Since f(z, 0) = 0 for all z in R,, it follows that P(n)e*” = 0, for 
a S 1 S 2m). Moreover we obtain from the first mean value theorem for 
integrals that 


7 K(n)P(n)e=™ dn as K(@) pe P(ne*™ dn, 


where a S & S 2m) . Equation (2) reduces then to the form 


n—1 
@) K® = —~ © Kew). 


It is noted that the only sample value on which ¢ is dependent is the 2) . 
Equation (3) is valid for every tq) ,--+* , 2-1» , satisfying the inequalities 
Za) S 2% S°*+ S w+ S Xm With the x) assuming values in R*(@). Let 
Zn) take some fixed value arbitrarily close to b(@). (If f(b(6), 6) # 0, we can of 
course let 2(n) = b(@).) Also let x be any number satisfying the inequality 
as2zs2,. Now let tq) = 2a) = +++ = Lua-1») = Z, and we obtain from (3) 
the relation (4) K(x) = K(&). 

Since this relation is true for every z in the interval a S x < 2m), it follows 
that K(x) is a constant in the interval a S x < 2). (Again if we assume 
f(b(6), 0) + 0, wecan let x.) = 6(6), and it follows that K(x) would be a constant 
in the closed interval a S x S 6(6).) Therefore, necessity is proved. 

PROOF OF SUFFICIENCY. This proof is extremely simple. If f(z, @) = P(x) Q(@), 
we have 


_[Q@)" P@w) --+ P&w) - 
nig@rPce.) | f° Pea) ae | 


and is independent of @. Hence 2, is a sufficient statistic for @. zane completes 
the proof of Theorem 1. 

Before proceeding to the problem of constructing a best estimate for 6, we 
will use a theorem due to Blackwell [3] which will enable us to restrict ourselves 
to the class of unbiased estimates of @ which are functions of the sufficient 
statistic for 6. Blackwell’s results are applicable to a much more general situa- 
tion than we are considering here, and the results needed can be obtained in a 
different manner. Nevertheless we will summarize briefly the result which we 
need. He has proved that if z is any chance variable and y is any numerical 


h(zq, oe L (ny | X¢n) ; )) _ 





46 R. C. DAVIS 


chance variable for which E(y) and Ely — E(y)} are finite, and f(x) is any real 
valued function for which E[f(x)y] is finite, then o E(y | x) is finite, where E(y | x) 
denotes the conditional expected value of y given x. Moreover he proves that 
Elf(x) E(y | z)] = Elf(x)y] and oE(y|x) S o’y with equality holding only if 
y = E(y| x) with probability one. 

As a particular application of Blackwell’s result it follows that if a sufficient 
statistic S exists, and if ¢ is any unbiased estimate of 0, then a(S) = E[t| S| is 
an unbiased estimate of @ with o’[a(S)] S o’t. It follows that we can restrict 
ourselves (in the case in which only the upper extremity of the range depends 
on @) to the class of functions of the sufficient statistic zn) which yield suffi- 
cient statistics. If we can obtain out of this class a unique function of z(,) which 
is unbiased and possesses minimum variance in this class, we will obtain an un- 
biased estimate of @ possessing minimum variance. 


4. Derivation of the best estimate for 9 when the range varies from a to b(@). 
If we make the transformation of parameters ¢ = [Q(6)]"’, matters are simplified 
considerably. If we assume that the function ¢(@) possesses a unique inverse 
6(¢) and let c(g) = b[@(~)], we have the condition that a(z,.)) is an unbiased 
estimate of ¢ in the form 


e(¢) 
(5) [ (2 (ny) 9 (Xn) ? ¢) AZ (n) = ¢. 


This reduces to the condition 


e(¢) Z(n) n—1 ntl 
[O° aewrPeeea) [ [Pon aa] aoe = 


(n) 
If we use a new variable of integration u, where u = [ P(m) dn, and let 


a(2n)) = ¥(u), the condition of unbiasedness becomes 
ntl 


. V(u)u"" du = ¢ 
0 n 


Clearly the only solution of this integral equation which is an analytic function 
of u is given by 


1 
y(u) = (1 + a 
n 
Since this is the unique solution for all finite ¢g, it follows that 


1 Zin) 
(1 + Nf P(n) dn 


is the only unbiasetl estimate of ¢. Its variance can be obtained by a simple 
integration, and we obtain 





ON MINIMUM VARIANCE 47 


If we wish to obtain an estimate for 6 directly, the analysis is somewhat more 
complicated. Moreover it is necessary to make a further assumption to insure 
that the unique unbiased estimate of @ among the class of functions of 2) is 
also a sufficient statistic. We may state this assumption as follows: 

AssuMPTION C. b(6) is a strictly monotone function of 0. If we define the follow- 
ing well defined functions 


Z(n) 
U(X (ny) my [ P(n) dn, B( zw) a b"(reny), 
the functions u(2(n)) and B(xin)) satisfy the following condition: 
d dg pi : ; . : ‘ 
us [in (#)| > -2 (if b(0) is strictly monotone increasing), 


u = [im (#)| < -—2 (if b(6) is strictly monotone decreasing). 


Moreover, the parameter set Q is the interval defined by 6 = 0 when b(@) is strictly 
monotone increasing and the interval 0 S 0 when b(6) is strictly monotone decreas- 
ing. 0 satisfies the equation b(@) = 0, so that b(@) = a. 

Let a(Zi)) represent now a function of the sufficient statistic 2.) . The con- 
dition that a be an unbiased estimate is expressed in the form 


b(8) 
(6) / a(x (my) Gg (Zim), 9) driny = 8 


for every 6 « 2. This reduces to the condition 


b(0) Z(n) a-l 9 
[ (2 (ny) P(X ny) lf P(n) an| drm) = nlo@l’ 
=(n) 
If we make the same substitution used before; namely, u = [ P(n) dn, 


and let a(z.)) = ¥(u), the condition of unbiasedness becomes 


iD 1/Q(0) iis Be 6 
(7 ) I y(u)u du = nlQ®l*’ 
dQ 


It follows from Assumptions B and C that © and hence 7) exist almost every- 


where in 2. Hence differentiating (7), we obtain after simple algebra, 
lage cote 
Pets =F 
na In Q(@) 


for 6 €Q. Since Q is an interval having @ as one end point, we obtain after 
some manipulation the expression 


u d(x (ny) 
n du(xin)’ 


(8) a(t) = B(tm) + 


; 
: 
: 
{ 
: 





48 R. C. DAVIS 


where 8(2(,)) is the function inverse to b(z,,)), denoted in Assumption C as 
b*(x4ny). o(@(ny) is the only continuous function of 2) which is an unbiased 
estimate of 6. In order to insure that a(z;,)) is also a sufficient statistic we must 
be certain that «(2;)) has a unique inverse a *(2,)). If we take the case in which 
b(@) is strictly monotone increasing, this condition becomes 


da dp du d's 
¢ 


(9) =t+) 4m 5a 


dx n) AX (n) dXny Gu 

If Assumption C holds, a(2,)) is a sufficient statistic for n = 1. Finally apply- 
ing Blackwell’s theorem we conclude that a(z;,)) given by (8) is the best esti- 
mate of @. From (9) it is obvious that if the function u £ in (2) is a bounded 
function of 2) for a S 2m) S 6(6), 0 €Q, then for n sufficiently large a(x;,)) 
is a sufficient statistic and hence is the best estimate of @ assuming only the strict 
monotonicity of the function b(@). 

4a. Examples. 

Rectangular Distribution. Let 


ft,)==, 05288, 
= 0, otherwise. 


Since P(x) = 1, and b(@) = 6, we obtain u = 2m) and 8B = Zam) 


(2) a B(x ¢n)) + : dp = ( + *) Zin) « 


ndu 


6? 
Its variance is given by the expression a= nin + 2) : 
Exponential Distribution. Let 

fiz, 0)=e", -wo S288, 


= 0, z> 6. 


Since P(x) = e*, and b(@) = 6, we obtain u = ¢*™, 8B = 2m). Hence 


a(zy,)) = Bltm) + 5 - 


1 
= Ln) + =" 
5. Both extremities of the range depending upon 0. 
THEOREM 2. Let 21, 22, °-: , Xn be the values of n independent drawings from a 
population having the probability density function f(x, 0) satisfying Assumptions 
A and B, and in which both extremities of the range depend upon 6. The necessary 


and sufficient condition that the first and nth order statistics, xq) and Zn) , be jointly 
sufficient statistics for 6 is that 


f(z, 6) = P(x) Q(6) forall (zx, 0) in R*(6) X Q. 





ON MINIMUM VARIANCE 49 


PRooF OF NEcEssITY. Suppose that in a sample of n independent observations 
that the first and nth order statistics, xq) and 2), are jointly sufficient for 6. 
It follows from the definition of joint sufficiency that 


S(ar , 9) +++ fen, O) = gla , Zen) 5 O)A(Te@ , ++ 5 Lent) | Tay , Tem) 5 9), 
where g(x) , Zn) ; 9) is the joint frequency function of xq) and 2) , and 
h(xqy ,*** 5 Zn) | Lay » Tem 5 9) 


denotes the conditional frequency function of the order statistics Z@ , --* , Zm—1 ; 
given fixed values of xq) and 2;,) , and is independent of @. It is well known from 
the theory of order statistics that g(rq) , 2m) ; 6) has the form 


g(Xay 5 Zn) 5 9) = n(n — 1)[F (em) — F(zq)]” ‘f(za) f(z), 


where F(2()) — F(zqy) = f(n, 9) dn. It follows from the above that 
#(1) 
n—l n—1 
| exp E a Kew) |] I] P(zqy) 
to gan 
er Eo oe 
n(n — 1) | / P(n)e* an | 
( 


z(1)) 


h(x@, *+*, Lint, | Za), Tiny 5 9) - 


The proof proceeds similarly to the one in Theorem 1, and we end up with 
a similar equation 
1 n—l 


(10) K(é) is a » K(zw), 
where 2a) S & S X@m . Hence by a similar argument K(z) is a constant in the 
open interval a(@) < x < b(6). If f(a(@), 6) and f(b(@), 6) are unequal to zero, 
we can make the stronger statement that K(z) is a constant in the closed interval 
a(@) Sz S (6). 

PROOF OF SUFFICIENCY. Suppose that f(z, 6) = P(x) Q(@). Then 


e@)" a Plew) 


! 
h(x, ***, Lenmty, | Tay, m3 9) = 


SOT ete ee a n—2 
tte nie@r| P(n) d | 


2(1) 


and is independent of 6. Hence xq) and 2;») are jointly sufficient statistics for 6. 
This completes the proof of Theorem 2. 

Blackwell’s theorem is applicable again to this case and enables us to restrict 
ourselves to the class of unbiased estimates which are sufficient statistics for @. 
Any unbiased sufficient statistic is a solution of the integral equation 


b(@) Zin) 
(11) . dx(n) es a(rq) , Zny) Gay , Zny) AXqy d2—ny = 
for 6 €Q. 





50 R. C. DAVIS 


Pitman has shown {1} that in the particular case a(@) = 6, b(@) a strictly 
monotone decreasing function of @, a sufficient statistic for @ exists. An inde- 
pendent proof is given of this statement. Moreover, the distribution of this 
sufficient statistic is derived, and it is shown that there exists a unique unbiased 
estimate of 6 in the class of all functions of the sufficient statistic. 

Following Pitman we simplify the discussion considerably by assuming 
a(6) = 6. On the basis of Theorem 2 and Blackwell’s result we need only con- 
sider functions of the smallest and largest order statistics in our search for a 
best estimate. First we derive Pitman’s result independently. Let us consider 
the sample statistic 


T = min. {xq , 0 (2@m)}. 


We proceed first to find its probability distribution and then show that it is a 
sufficient statistic for 6. Figure 1 shows a typical contour of constant 7’ in the 
Za) , Xm) plane. 


Xo) 


618,410) 


Xb) = b (Xu) 


FicureE 1 


First it is clear from Assumption A that we may confine ourselves to the 
interior of the triangle shown in Fig. 1. Moreover, it is clear from the continu- 





ON MINIMUM VARIANCE 51 


ity and monotony of the function b(@) that there exists a point with coordinates 
c, c (where b(c) = c) which is independent of 6 and is such that 


6sc S b(6) for all 6€Q. 


From Assumption B, 2 & I, where J is the interval in R; given by @ S c. It 
it clear from the definition of T that 


T 
T 
T 


b7*(2ny) for all points above the curve Zn) = b(xq)), 
Xa) for all points below the curve z,) = b(xq)), 
tq) = b*(2q@)) for all points on the curve tm) = b(zq)). 


ow 


A typical contour of constant 7 is shown in the figure. If we denote as before 
by g(a), 2m) the joint frequency function of the order statistics zq) and 
Xn) , it follows that 


b(t) 
(12) Pr{ti<T <t+dt} =  f g (xq), Zn) ac | dt 


[z(1)=#] 


b(t) 
3 lf g(xq, Lin) aa [b(é) —_ b(t + dt)}, 


[2(n)—=d(t)] 


where the first integral is evaluated holding xq) = ¢ and the second integral 
holding 2z;,, = b(t). It follows from the continuity and monotony of 6(@) that 


if we restrict the parameter set 2 to be a bounded interval in R, , $ will exist 


everywhere except on a set of points having probability measure zero. In this 
case T possesses a frequency function w(t) almost everywhere. After perform- 
ing the elementary integrations in (12) by noting that the integrands can be 
expressed as perfect differentials, we obtain 


b(t) n—1 
(13) we =nig@r| J Pa an] [po - 2 Pow |. 


To prove that 7 is a sufficient statistic for 6, we must prove that the con- 
ditional frequency function of xq) , Ze) , «++ , Xm , given 7’, is independent of 6. 
To do this we show that this property holds in each of the two regions indicated 
in Figure 1; namely in the regions below and above the curve zm) = b(zq)). 
In the region below the curve, we have 

P(aq)P(zq@) +++ P@m)Q@)" 
Aisa, ta, °°. 8a {= oe eee 
(za, X@, a(n) | T) wil) 
Obviously this conditional frequency function is independent of @. In the region 
above the curve, Zn) = b(xq)), we make the following transformation in the 
sample space: Let pi: = qa) , p2 = Za), °*** , Pn = Lunt), Pn = T. Since 


a(n, Me, 5°? » Pn) siete db (x(n) 


O(Zay, Za, *** » Tew) dx n) 


j 
| 
| 
| 
: 





52 R. C. DAVIS 


the transformed likelihood function becomes 


—1\—1 
fa, A)f(z@, Q) ++ ftw, 6) (2 ) ° 


dz in) 
If we now assume that b”*(2~)) is a strictly monotone decreasing function 
—1 


of 2m), the transformation is one-to-one and = is unequal to zero except 
(n) 

possibly at a set of points in the 2a) , 2m) plane of probability measure zero. 

We may state then that 


P(zq)P(zq@) -:- Peeala@r( Se) 
Te 


Again this conditional frequency function is independent of @, so that this prop- 
erty holds throughout the triangle in Figure 1. Hence T is a sufficient statistic 
for 6. 

We proceed to prove that there exists a unique continuous function of T 
which is an unbiased estimate of 6. This will involve no additional assumptions 
not made already. If y(t) is an unbiased estimate of 6, we have from (13) 


(14) Biyiol = ntgor [vo f 


K(xq, 2m, +++, 2m) | T) = 


b(t) 


ee db 
P(n) an| | Pt ? = Pe) | dt = 6 


for 6 € Q. Differentiating (14) with respect to 6, we obtain 
1 


vy) =6- Te 
na [In Q(@)] 


Since Q is the interval 6 S c, we obtain 


1 


un f- ————. 
n di [In Q(T)] 


(15) 


Hence (15) with T = min. {xq , b-*(zi)}_ is the unique continuous function 
of T which is an unbiased estimate of 0. 


We now require an additional assumption to insure that ¥(T) given by (15) 
is a sufficient statistic for @. 


AssuMPTION D. For almost all T satisfying @ S T Sc, and for all 6 €Q, the 
b(T) 
function In Q(T), where [(Q(T)|* = / P(n) dn, satisfies the inequality 
T 
a 
— [In Q(7)] 


2 
i th 


: ats 
i$, In an | 


where M is some fixed constant. 





ON MINIMUM VARIANCE 53 


The following theorem can be established: 

THeoreM 3. If a probability distribution with range from @ to b(6) satisfies 
Assumptions A, B, and D, with K(x) = 0, and if the functions b(@) and b™’(6) 
are strictly monotone decreasing for all 0 « Q, then the function ¥(T) given by (15), 
where T = min. {xq),b0°(aq@)}, ts the unique best estimate for the unknown 
parameter @. 

Proor. Under the above assumptions (minus Assumption D) we have proved 
that ¥(7) given by (15) is (among all continuous functions of the sufficient 
statistic 7’) the unique unbiased estimate of @. However, in order to apply 
Blackwell’s theorem, we must show that ¥(7) is also a sufficient statistic. From 
(15) we obtain 

ad 
- = lin Q(7)] 
 Raliee SSE 
n| 4 (In Q( | 

From Assumption D it follows that for all sample sizes n = 1 we have 
1+ x > o > 0. Hence the function ¥(7) establishes a one-to-one cor- 
respondence between 7 and ¥(7') except possibly at a set of points of probabil- 
ity measure zero. Therefore ¥(7') as defined in (15) is a sufficient statistic. It 
follows immediately from Blackwell’s theorem and the existence of a unique 
unbiased estimate among all functions of 7 that ¥(7) is the best estimate of 
the unknown parameter @. 

TuHEoreM 4. If a probability distribution with range from @ to b(@) satisfies 
Assumptions A and B with K(x) = 0, and if the upper extremity of the range, 
b(@), ts not a strictly monotone decreasing function of 0, there exists no single suffi- 
cient statistic for 0, which is a single valued function of the values of n independent 
drawings from the population. 

Proor. Under the assumptions of the Theorem to be established we have 
proved in Theorem 2 that xq) and 2) are a sufficient set of statistics for 6. 
We may therefore confine our attention to a search for a single valued function 
T (xq) , Xm). It is clear that 


Prit<T <t+ dt} = nin NlQ@r ff Pea)Pew) 


(16) t<T <t+at 


Zin) n—2 
° | P(n) an | dx a) dX (n) ° 


#(1) 


Since the likelihood function of the ensemble of n independent observations 
taken from the distribution has (under our assumptions as to its form) the 
factor [Q(6)]” as the sole term involving @, it is evident from the definition of 
sufficiency that the integral 


Zin n—-2 
(17) I P(x qa) P(x) If P(n) an| dra) dx), 


t<T<t+dt 





54 R. C. DAVIS 


when evaluated over the region common to the strip ¢ < T < ¢ + dt and the 
triangle 6 S xq) S tm, 9 S Lm S (6) in the xq) , Z~) plane must be inde- 
pendent of @ except in the case in which the strip includes a finite length of 
either the line xq) = @ or the line xj) = b(@). Moreover this restriction must 
be satisfied uniformly in @ for @«Q. The situation is clarified by looking at 
Figure 2. 


Xb) 


40), £78) 


t<7<f rot 


FIGURE 2 


It is clear from Figure 2 that if the strip t < T < ¢ + dt does not enter and 
leave the triangle along the line zq) = 2) without crossing either of the other 
two sides for every @ in Q, the integral in (17) will be a function of 6. Suppose 
that the statistic 7 is of such a form that one of its strips t < T < ¢ + dt does 
not consist of the portions of two straight lines as was the case in Figure 1. 
Then for some @, € Q this strip t < T < ¢ + dt will intersect the triangle cor- 
responding to the value @, somewhere along at least one of the lines tq) = (; 
Or Xm) = 6(6;). It follows that the contours 7 = constant must be of the same 
type as shown in Figure 1 regardless of the nature of the function b(@). 

Next we proceed to show that if b(@) is not strictly monotone decreasing, the 
assumption that 7' is a single valued function of xq) and 2») is violated. The 





ON MINIMUM VARIANCE 55 


argument proceeds as follows: under the assumptions of the theorem b(@) is a 
continuous function of @ which is not strictly monotone decreasing. Hence there 
exist at least two values of 6, say 6; and @, such that the corresponding 
contours of fixed 7’, say T; and T: intersect at least in one point P. The situation 
is shown in Figure 3. Now obviously 7; = T;, since otherwise T(%q) , 2(n)) 
would not be a single valued function of rq) and 2) . From the properties of 


Xo) 


> 
® 
ws 


Figure 3 


the function 6(@) there exists a 6 ¢« such that the triangle defined by 
6 Say S tm, % S Xn) S b(H%) includes a finite length of the contour 
T, = T; = constant. Moreover since this contour cuts the above triangle at 
one or more points whose coordinates depend upon the value of 4%, it follows 
that if the true value of @ is # , the integral defined in (17) will be a function 
of @ . Hence T is not a sufficient statistic for @ for the true value lying in the 
parameter set Q. 


6. An alternative approach when a single sufficient statistic does not exist, 
It follows from Theorem 4 that if b(@) is not a strictly decreasing monotone 
function of @ that no single sufficient statistic exists. The question remains as 





56 R. C. DAVIS 


to what to do to obtain an estimate for 6. The following procedure yields an 
unbiased estimate for a certain function of 6 which is ‘“‘best’’ only in the sense 
that it has minimum variance among the class of all analytic functions of two 
prescribed functions of xq) and 2). The fact that the sufficient statistic first 
derived by Pitman; i.e., T = min. {xq) , b'(zq@))} is not an analytic function 
of xq) and 2) throughout the triangle 6 S rq) S 2m , 9 S Xm) S 1(O) sug- 
gests that perhaps the best estimate may always be a non-analytic function. 
In any case the following procedure is suggested for lack of a better one. 

Make the transformation of parameter ¢ = [Q(@)]"* and the coordinate trans- 
formation 


Z(n) z(1) 
= [ "Pod, v= | PG) de, 

2(1) c 
where c is any fixed point whatsoever in R, ; i.e., c is independent of the value 
of 6 for any 6 «Q. First we will prove a lemma concerning fixed points of the 
nature of c. 

Lemma 1. For a distribution satisfying Assumptions A and B with K(x) = 0 
and with the additional restriction that the functions a(@) and b(@) possess first 
derivatives (a(@) and b(@) depending non-trivially upon @), there exists a point c 
satisfying for all 6 € Q the conditions 

1.) a(@) Sc S (6), 

2.) c is a fixed p-quantile (0 < p < 1) of the distribution, if and only if 


P{b(6)] db(@) _ <0. 


P{a(6)) # 0, P(b@)] # 0, Pia@| da) = 


for all 6€Q. 
Proor. If there exists a fixed point c which is a p-quantile, the 


(18) Q@) |, PG) dn = p 


b(6) 


i ] > 
Writing g(@) = OO = [ P(n) dn, 


(6) 


and differentiating (18) with respect to 6, 


da Se aes db _ da 
a P{a(@)] = =” ie P{b(6)] 76 Pia. 


Solving for p, we obtain 


1 
_ Plb(@)| db” 
P{a(@)} da 


Since there is at most one value of p obtained from (19), and since P(x) > 0, 
it follows from (18) that c is a single valued function of p. This completes the 
proof of the lemma. 





ON MINIMUM VARIANCE 57 


It is clear from Lemma 1 that in the case we are now considering there exists 


no fixed point c which is a p-quantile of the distribution, since ja ® not negative 


for all 6 « 2. We are now ready to prove the following theorem: 

THeoreo 5. For a distribution satisfying Assumptions A and B with K(x) = 0 
and with the additional restriction that b(@) is not a strictly monotone decreasing 
function of 6 for all 6 €Q, there exists among the class of all analytic functions of 

Zin) (1) 
[ P(n) dn and v = P(n) dn a unique function of u and v; namely 


(1) e 
- ] beg 7 , 
i )™ which is an unbiased estimate for ¢. 
Proor. Under our coordinate transformation to u and v as new variables 
. . -_ —2 . 
of integration, g(xq) , Zin) 3 ¢) dXq) drm) = n(n — l)g “u”~ du dv. Introducing 
a(@) 


a new function of 6; namely, 8 = P(m) dn the condition (11) for unbiased- 


ness in @ becomes for the new parameter and in terms of the new variables 
u and v, 


¢ e—ut+p 
(20) [ du [ n(n — l)g * u™’ yu, v) dv = ¢ 
0 8 


for all g for which @ lies in 2, where y (u, v) is an estimate of y. If we expand 
¥(u, v) in a double Taylor series about the point u = 0, v = 0, it is clear that 
the only terms which satisfy (20) identically in ¢ are 


y(u, v) = au + bu, 


where a and b are constants. We will now derive a relationship between a and 
b by integrating (20). After some easy algebra we obtain the relationship 


3 
(21) geet et?) eas 


n—1 nal 
Under the conditions of the Theorem it is clear from Lemma 1 that the point 
B 


c is not a p-quantile uniformly in ¢ and hence = is not a constant independent of 
¢ 


¢. Hence the only solution of (21) is given by a = at , 6 = 0; and the only 


unbiased estimate of ¢ is 


n + 1 Z(n) 

n-—l 2(1) 

REFERENCES 

{1] E. J. G. Prrman, “Sufficient statistics and intrinsic accuracy,” Proc. Camb. Phil. Soc., 
Vol. 32 (1936), pp. 567-579. 

[2] B. O. Koopman, “On distributions admitting a sufficient statistic,’’ Trans. Am. Math. 
Soc., Vol. 39 (1936), pp. 399-409. 


[3] Davip BiacKkwELL, ‘“‘Conditional expectation and unbiased sequential estimation,” 
Annals of Math. Stat., Vol. 18 (1947), pp. 105-110. 


(22) =o P(n) dn. 


PAR Wk” en NCIGL 


cee AL ae 









es mete ai OF ssa 
miners i | Rett 


ON THE DISTRIBUTION OF WALD’S CLASSIFICATION 
STATISTIC 


By Harman Leon HARTER 


Michigan State College 


Summary. In this paper we shall consider the exact distribution of Wald’s 
classification statistic V in the univariate case, some theoretical approximations 
in various multivariate cases, and an empirical distribution in a particular 
multivariate case. We shall also draw some conclusions as to the potential use- 
fulness of the statistic V and the work which remains to be done. 

1. Introduction. In many educational and industrial problems it is necessary 
to classify persons or objects into one of two categories—those fit and those unfit 
for a particular purpose. In formulating this problem of classification, Wald [1] 
assumed that for p tests we know the scores of N, individuals known to belong 
to population II, and of N: individuals known to belong to population Tl, , 
along with those of the individual under consideration, a member of the popu- 
lation II, where it is known a priori that II is identical with either 1, or I. . 
He assumed moreover that the distribution of the test scores of the individuals 
? making up I, and Il, are two p-variate normal distributions which have the 
same covariance matrix, but are independent of each other. In order to classify 
the individual in question into either I, or TI, , Wald introduced the statistic 
V defined by the relation 





“eA 


(1) Fete Pimten:: Cehteih ~S: 


i=] j=l 








where 


> tia tia 


(2) ls? I] = Ul sa Is 83 = = 














and where the variates tis (i = 1, --- , p;8 = 1,-+-,n + 2) arenormally and 
independently distributed with unit variance and with expected values 


(3) E(tia) = (0 (a = 7 SF Fa n), E(ti.n41) os" Bs » E(ti.n+2) = vs ? 


where p; and ¢; are constants. 
2. The exact distribution of V when p = 1. In the univariate case, the de- 
finition (1) reduces to 


(4) V m O's nsthinsts 





where 








1 ] 
3 ae oes n=wN,+ Nz — 2. 
on Bae 


a=] 


58 

























CLASSIFICATION STATISTIC 


Thus, in the case p = 1, 
(5) “ee tin+1 lin,+2 _ wv 


Féin" 


’ 


where 


t= tienti > y= tinte » z= a ti./n. 
a= 


In the degenerate case (p: = {; = 0), z and y are normally distributed with 
zero means, so that their oetr laws are 


6 P oe =o. 
Because of symmetry we have then 

‘ Bt tg Btu 
(7) P(| zx ) = Van é€ oF P(| y !) = an € iv? 


It is well known that z = )>%_; j./n is distributed as x’/n with n degrees of 
freedom, that is, the probability law for z is 


n™ ae 1 e™ 
TG) 2 


Now we proceed to find the probability law of V = ii-i9] in a manner 


(8) P(z) = 


similar to that used by Shrivastava [2] in investigating a different statistic. 


Let w = in| V| = In|z|+ In|y|— Inz. Then the characteristic function 
of w is given by 


(9) oo = [TT] ” "P| 2 )P(| y P(e) dz dy de. 


Substituting the values of P(|z|), P(|y|) and P(z) from (7) and (8), and 
making use of the ee of z, y and z, we have 


(10) (i) = —— [« it mae [ ye ay [: dn—1—it ins 9 


Expressing the integrals in (10) in terms of Gamma functions and simplifying, 
we find 


(11) () = ap Pan in[r (H22)]. 


Upon inserting this result in the Levy inversion formula 


(12) P(w) = x r g(t) dt 





Ee Se Ge eae 
cla ra Nusa Gee 


60 HARMAN LEON HARTER 


and making the substitution v = it, we obtain 


at + 100 ‘ 2 
(13) P(w) = ee [ e “IT (4n — v) lr (' +) dv. 
— 2 to — 


Using a property of the Gamma function given by Whittaker and Watson [3] 


(14) r(2)r(l — z) = — 
sin Zz 


and letting z = $n — v, we obtain 


5 nines ind oes tier te 
(15) rin — ») = '(v — 4n + 1) sin ($n — v) 


Substituting this value of ['($n — v) in (13), and simplifying, we find 


2 
1 1 +100 n° lr ¢ ‘ ‘)] 
(16) P(w) = . —— nn” cama 


I'(4n)  2xt Lin an s(n — TO —- int D dv. 


We shall now perform a contour integration, using as the contour the imagi- 
nary axis plus the semicircle in the right half-plane with center at the origin 
and infinite radius. It can be shown that for | n/2e” | < 1, and hence for | V | > 
n/2, the integral around the semicircular portion of the contour is zero. Hence, 
under these conditions, the integral on the right side of (16) is equal to (—2z7) 
times the sum of the residues at all the singular points in the right half-plane. 
The integrand has simple poles at v = 4n , $n + 1,43n + 2,---, and no other 
singularities in the right half-plane. Inserting the actual values of the residues, 
using the fact that cos kr = (—1)*, for k an integer, and letting v = j7 + 4n, 


we find 
E (2 bat2)] 
3: e (Finley Hie _ 14 a 4 J 


1 
i oe — PG+1)_ 


Replacing e” by | V | and multiplying by an = v7 , we obtain the prob- 
\ 
ability law for | V | 


as) PV) = —— Sin) ane 


mI'(}n) j=0 


The infinite series on the right side of (18) converges for precisely those values 
of | V | for which the integral along the semicircular portion of the path is zero, 
that is for| V | > 4n. Since the values of x and y are symmetric about zero and 
uncorrelated, the values of V are also symmetric about zero, and hence P(V) = 


$P(|V!). 





CLASSIFICATION STATISTIC 61 


To obtain a series for P( | V | ) which converges when | V | < 4n, it would be 

necessary to perform a contour integration around the left half-plane, which is 
2 

considerably more difficult, since the presence of lr € : ‘) | in the integrand 

of (16) introduces double poles atv = —1, —3, —5,---. 

If we drop the restriction ¢; = 0, but keep p, = 0, V will still be distributed 
symmetrically about zero, since z is distributed symmetrically about zero and is 
independent of y.'The probability laws for z and z will be the same as in the 
degenerate case, but P(| V |) will be different, due to a change in P(| y| ). 
Since the mean of the distribution of y’s is now ¢, + 0, we have 


1 
(19) P(y) bs we eet 
which yields 
oe ms —t(y—f 1) —iK—-) a 3 e™’ (yt:)” 
(20) P(|y}) Vax |? “ +e ] / 2s e ye 


Proceeding in the same manner as for the degenerate case, we find as the char- 
acteristic function of w = In| V | 


wi (it+1 uy Oe (283)" it+1 
ss 00 = est (HH 2 ) ran - i SCO (r+ 2 ) 


Again using the Levy inversion formula (12), and letting v = it, we have 


—t? +io 
P(w) = satiate ee oe 3 


(2¢3)’ v+1 
-T(jn - o> Sor (r+ 9 ) a. 


This integral may be evaluated by integrating around the same contour as in 
the degenerate case. Performing the contour integration and simplifying, we 
obtain 


2 —ow v+i1 
2 «o Ne r(2t 


é€ 
(23) Plw) = Ta 4. Tos i 1) 


b. yr eae r(r+°t), 


Qr)! 2 


Replacing e” by | V | and multiplying by —— 7 iv 7 1 “7 , we find 


2 eT (F 7) Sart 


(24) P(|V|) = TGs) Sy. IVT ma & ent” 





62 HARMAN LEON HARTER 
Letting v = 7 + $n, this may be written in the form 

. (—Dinp (+24?) 
mI'(4n) j=0 LV pterip( 7 + 1) 


(250) ( tet?) 
Satay 


E i? 
(5) PUVD = 


This expression is valid (since the integral vanishes along the semicircular por- 
tion of the contour) and converges for precisely the same values of V as in the 
degenerate case, that is for | V | > n/2. 

3. Approximate distributions of V in various multivariate cases. Wald [1] 
has shown that the distribution of the statistic V is the same as that of the 
statistic 


Ms 
(26) i= +9 a! 


where the joint distribution of m; , mz and mz; is known. Since m,, m2 and mg 
are of the order 1/n in the probability sense, the denominator of (26) is near 
—1 nearly always for sufficiently large n. Accordingly, Wald has suggested that 
even for moderately large n, V is distributed approximately as nm;. By in- 
tegrating out m, and m, over the domain for which the joint distribution is real 
and 2 0, it is possible to find the distribution of m; , and from it the distribution 
of nm;, which is approximately the distribution of V, for sufficiently large n. 
We restrict ourselves to values of n and p satisfying the relation 1 < p S n. 
Four cases have to be considered: (la) n even, p odd; (1b) n even, p even; (2a) n 
odd, p even; and (2b) n odd, p odd. 

For the degenerate case pj = {; = 0, it can be shown that the joint distribu- 
tion of m, , m2. and ms; given by Wald [1] reduces to 


(27)  C[(Q — m)(1 — ma) — mi"? tmym, — m5"? dmidm, dm; , 


where C is a constant. In integrating out m, and m,, we must be careful to 
integrate over only the domain for which the joint distribution (27) is real 
and 20. This requires that the following inequalities hold: 


(28) mm; — m3; = 0, (1 — m)(1 — m2) — mj = 0. 
From these it follows that the limits for m, and m2 are 


2 2 ements. te 
@ =40 61+ S 1 = v1 = 4m; 


< 
mM at 1 — m,’ 2 sli 


< 1+ vin 408 


“ 


For Case la (n even, p odd), let p = 3 + 2c, where c = an integer = 0. The 
distribution function G,,,(m3) can then be expressed as a double integral, as 
follows: 





CLASSIFICATION STATISTIC 63 


(140/14) /2 2-9 3/ (lm a) 
G, = ; / th oe ei — mie 
(30) 3+2e(™s) = C ne ee [(1 — m,)(1 — m.) — mj] 


+ [my m, — mi}° dm dm:. 


Expanding repeatedly by the binomial theorem and integrating out m,, then 
expanding again and integrating out m, , we find 


(n—4)l2-e fn — 4 ey 
Ga,s42e(Ms) = C = 2 
j—0 : 
J 
PAS ae. 8 
2 (i) ») (ea 2q 
[A j,n,¢(ms) + By,x,q(ms) — Cyx(ms) — Djx(ms)], 


where 


min{(n—2)/2—jmg,(n—4)/2—e-k}) (M — 2. 
Aj.x,q(™s) = x ( a - ) miktetn 


=O 
: r 


tom 


t 


[(: + Le ane S (: ae ow cc ol 
2 2 : 


(nine (R= 2 —j—@Q)\_ 26+e+r" 
Bi xq(™s) = Zz 2 —s" 


r!an(n—2)/2—c—k r’ 


} “4h, in LM ami 
otek = iy 1+ V1 — 4m} 


(n—4)/2—c—k—r n—4 
: ons 5 —-c-k-r 2 
(32) x ( o'( * eee 


— 


Jaga enma ep tae [OEE i 
+). 2° ER ( t ) power 


t’ =O 
t'yér’! +-e+k—(n—2) /2 


(? + V1 Pe mss 3 C mes ay 
2 2 ’ 


‘uanbiae n—A4 Be 
Ciul(m) = 22 (-1)'| 2 


(34) 8yhj—k 


= - Fs 





64 HARMAN LEON HARTER 


i : — A/T = dot 
(35) Djx(ms) = (—1)7*™ 2 of ef OOD 1 vA -. 
j-k 1+vV1 — 4m} 


the terms involving natural logarithms having the value zero when m; = 0. 
As a numerical example we have, after normalization, 


Gyo,3(ms) = — 1801 Ge + A $23 + gms + ims)V/1 - 4m? 
(36) 

— (m5 + $m} — 4m}) In ae OED 
1—~Y1 — 4m? 
For Case 1b (n even, p even), let p = 2 + 2c, where c = an integer = 0. The 
distribution function G,,,(ms3) can then be expressed as a double integral, as 
follows: 


[(1 — m,)(1 — m.) — m3|"* 


1+V/i1— asta 


n,2+2 2e(™s) _ cf 
1—vV/ 1—4m2 2)/2 mi/me 


(37) 


+ [mym, — mi\** dm, dm. 


This double integration can be performed by the use of certain formulas given 
by Peirce [4], and after evaluation we have 


Ga,c42e(ms) = C-2x(— | alia 


_ ec — 1)(2e — 3) --- 1(n — 2c — 3)(n — 2c — 5) - 
(n — 2)(n — 4) -- 


(n—2)/2 (n—2)/2—j n—2\ /n—2 j 
(n—2) /24+4—c— 5 | —>— - ' 
(—1) 2 2 Aj,x(ms) 
j=0 k=O 


(i-k) Se I 


(n—2)/2 (n—2)/2—j n—-2 


+ fe > (— 1) 2-4 9 “So 7 DBs hme) | 
a'G—k)>e j 





CLASSIFICATION STATISTIC 


Aj.a(ms) = m3*| m; >> (—1)* 
q=0 


_ 2m — 2c + 3)(2m — 2c + 1) --- (2m — IW — 2q + 5) 
ay (2c — 1)(2e — 8) «++ (2e — 2g — 1) 
(ee CE Sy, 
2 2 
(}— v= ss)" (Ct¥ i= es) 
re ME cree, Oe a OS tee 
« (2m — 2c + 3)(Qm — 2c + 1) --+ (2m +3) 
“i (2e — 1)(2c — 3) --- 1 
_ yeti 2m(2m — 2) --- 1 et 
(( 1) m+ Dam —1)>3 ‘ sin V1 4m?) 
aa 2m(2m — 2) «++ (2m — 2r + 2) 
* = Qm + 1)Qm — 1) --- Qm — 2r + 1) 


(8) (Esp 


+ (-)) 


q’=0 


Bjx(ms) - ‘m3? |. > 


_ (2m! + 2c — 3)(2m’ + 2c — 5) --- (2m’ + 2 — 2’ — 1) 
(2c — 1)(2e — 3) -+- (2e — 29’ — 1) 


crvizmel = vise) | 


Cota Qe =e 
ts ee 5 
~ (Ye = I)(Ge — 8) +-- 1 
- Ms _ (—1)"4 (2m’ — 3)(2m' — 5) «++ (2m’ —.2r’ — 1) 
r’ == 


(2m’ — 2)(2m’ — 4) --- (2m’ — 2r’ — 2) 


(aviza)-(=viem)} 


m=k+e-j-}, m=j—k—-—c+. 





ahah Batic Sg Osis ae 5 


66 HARMAN LEON HARTER 
As a numerical illustration we have, after normalization, 


55125 23625 4725 a as wo 


313515 . 99825 » 465... 45 , ass) 
‘Gee = m=“ © mt) | me| T= So 


(42) 


In Cases 2a and 2b, infinite series of elliptic integrals occur, and it appears that 
approximate integration is the best than can be done. 

The author plans a later paper on the distribution for the nondegenerate 
case pi = 0, £3 ~ 0. 

For small values of n, Wald’s approximation nm; is not applicable. One can 
obtain a fair approximation by replacing 1/[m; — (1 — m)(1 — me)] in (26) 
by its average with respect to m and m, over the domain, taking account of 
the joint distribution function (27). This yields 


- Ca.» Gn—2,p(ms) 
(43) V = =. nm, Ga.p(™ms) ’ 


where C,-2,, and C,,,, are the constants in the joint distribution of m , m2 and 
m,; for the values of n and p involved. The approximation (43), while rather 
crude, is better than Wald’s nm; for small values of n, and asymptotically 
equivalent to it asn —+ ©. 

4. An empirical distribution of V. A sampling experiment was performed 
in order to obtain an empirical distribution of 1000 values of V for n = 10, 
p = 3,p; = ¢ = 0. Ten thousand wooden beads were stamped with two digit 
numbers whose distribution approximates as nearly as possible that of a normal 
population with mean 50 and standard deviation 10. One thousand sets of 
values 2;.(i = 1, 2,3; a = 1, 2,--- , 12) were obtained by sampling with re- 
placement from this population. The values z;. were expressed in standard 
units ¢;. , using 


Zia — 5O 
tia = ———-.. 
(44) 10 


| 
| 
| 
| 


From the standard variables ¢t;, , one thousand values of V were calculated by 
means of (1) and (2), using IBM equipment. The resulting empirical distribu- 
tion is given in Table 1. This distribution was compared with the theoretical 
approximation (43), which is, forn = 10, p = 3, p; = §; = 0 


(45) V= 150 ; Gs,3(ms) 
7 Gro,3(ms) 


The approximation fits the observed distribution fairly well for the central 
classes, but underestimates the frequencies of large values of | V | quite badly. 

5. Conclusions. The statistic V is potentially very useful, but much work 
remains to be done in obtaining the necessary information about its distribution, 
especially in the small sampling case, and tabulating the associated probabilities. 
Even in the univariate case, where the exact distribution is known, the amount 





CLASSIFICATION STATISTIC 67 


of labor involved in determining probabilities is very great and a simple ap- 
proximation is needed, unless a high speed computing device is available. For 
the multivariate small sampling case, only a crude approximation to the dis- 


tribution of V is available, and the exact distribution or a better approximation 
is needed. 


TABLE 1 

Frequency distribution of 1000 empirical values of V for n = 10, 
p = 3, p: = £% = O (Class marks integers) 

| | 


| 
Frequency Class mark 


Frequency 


f Class mark Froquaney 


| 

1 12 3 se ing 15 
| 11 3 -~9 12 

1 10 3 —10 
6 
10 


«ofl 

—12 

16 —13 

—14 

—15 

54 —18 

85 7 

140 —23 
181 

~~] 134 “Fe 

—2 101 —28 

—3 52 —29 

—4 26 se 

—5 17 — 36 

—6 23 —37 
—7 12 


9 
8 
7 
6 
5 
4 
3 
2 
1 
0 


V = —.0700, oy = 5.938 


The author wishes to express his sincere thanks to the Office of Naval Re- 
search for the grant which made this work possible, and to Professor Carl F. 
Kossack of Purdue University for his helpful suggestions and patient guidance. 

REFERENCES 
{1] A. Wap, “On a statistical problem arising in the classification of an individual into 
one of two groups,’”’ Annals of Math. Stat., Vol. 15 (1944), pp. 145-162. 
[2] M. P. Surivastava, ‘On the D*-statistic,”’ Bull. Calcutta Math. Soc., Vol. 33 (1941), 
pp. 71-86. 
(3] E. T. Warrraker anv G. N. Watson, Modern Analysis, 4th ed., Cambridge University 
Press, London, 1940, p. 239. 


[4] B. O. Pernce, A Short Table of Integrals, Ginn and Co., Boston, 1929, pp. 18-19, For- 
mulas 113, 118, 119, 120, 121, 123. 





i 
- 
5 
j 
3 
: 
$ 
§ 
i 


RATIOS INVOLVING EXTREME VALUES' 


By W. J. Drxon 
University of Oregon 

1. Summary. Ratios of the form (7, — 2,_;)/(z, — 2;) for small values of 
i and j and n = 3, --- , 30 are discussed. The variables concerned are order 
statistics, i.e., sample values such that 1 < 2. < --- < z,. Analytic results 
are obtained for the distributions of these ratios for several small values of n and 
percentage values are tabled for these distributions for samples of size n S 30. 

2. Introduction. There has been interest in the problem of gross errors in 
data since Chauvenet presented his solution for the problem about 1850. His 
hypothesis was essentially that in some samples a small portion of the observa- 
tions were from a population with a different mean value. There has been re- 
search from that time up to the present on procedures suitable for treating 
such data. 

If it is assumed that a certain percentage of “gross errors’? may occur, then 
there are two general procedures for treating such data: 

(1) A statistical treatment may be given to the data which gives very little 

weight to such aberrant values as may occur. 

(2) A statistical test may be constructed which will indicate such values so 

that they may be rejected. 

The functions to be discussed here were designed for testing the consistency 
of suspected values with the sample as a whole. Investigation of the performance 
of these criteria is given in another paper. 

3. Critical values for rio. The first statistic to be considered is 


Tio = (In — n-1)/(Zn — 2%), 
where the subscripts on the z’s indicate ordered values such that 1 < 


Iz << +++ < 2,. The density function for 2; , %a-1 , 2, is 


a) sedan ([ "10 at)” flan) dan sfles) da. 


Setting v = 2, — %1,1TV = In — Inn, Z = Xp, and integrating z and v over 
their range of definition we have the density function of ri for a sample of size n. 


(The subscripts on the r’s will be dropped when there is no ambiguity.) This 
function appears as 


(2) (n — wa mil . a (fs at) f(a — v)f(x — ri) f(x)v dv dz. 


1The work presented in this paper was done under contract N6-onr-218/IV with the 
Office of Naval Research. 


68 





EXTREME VALUES 69 


There will be no loss in generality by considering the values 7; to have been 
drawn from a distribution with zero mean and unit variance, since the statistic 
is the ratio of two differences. It should also be noted that for symmetric popu- 
lations, the distribution of (x, — 2n-1)/(an — 2) will be the same as that of 
(te — 2)/(t, — 2). For the rectangular distribution the density function is 


(3) (n — 2)(1 — ro)” 0 <r <1), 
and the cdf is 

(4) 1 — (1 — Rw)”. 

If we set this expression equal to 1 — a we obtain critical values of Ry 

(5) Re = 1 — a’. 


For the more interesting case of the normal distribution, the operations in- 
dicated above are much more arduous. 


n = 3, Normal population. The integral in (2) above can be evaluated to obtain 
the density function of rio for the assumption of normality 


Sel Beer ick 
(6) gs(T10) = “i aos a +1 - 


The integration of this density results in the edf 


3 2 

are tan V3 (Ri — 4) + 3. 
Upon setting this last expression equal to 1 — a, we obtain 
(8) Rise = $+ 3 tan Fh - a”. 


n = 4, Normal population. The density function in this case becomes 
(9) Gyn Bg. Ee cay tara is | 
9) Ww) Forti lvi— ath) Von a Fl 
If we now set the cdf equal to 1 — a we obtain 


6 ages tal ai thee 
5— | are tan-y/ TRF = TR +3 + arctan ,V3R — AR + 4| 
(10) ~ 


=1l1-—a, 


which may be written as follows by taking the tangent of both sides of this 
equation: 


Ji —-4R+3+ pv aR aR +4 
(11) 9 ———__— - 


i eed = tan = (a + 4). 
1—- pV GR — 4k + 3)8R — 4R + 4) 


LA TD OOS, it, BI 


§ 
; 


DURES ar ee ae 





70 W. J. DIXON 


The integration,of g4(ri) was performed for the first term by substituting r = 
4+ (1/\/2)\/z? — 1. The second term of gi(rw) is identical with the first if 
one substitutes s = 1/r. 

n = 5, Normal population. For this case it can be shown that the density 
function has the following form 


15 [mir +h (?)] 
(12) gs(rio) = “Hore 
where 


ey 2-r -1 (1 — r)V/5Br? — 4r +4 
WS ene eet. 


The cdf for n = 5 has not been obtained in a comparable form to those obtained 
for n = 3,4. No such expressions were obtained for larger values of n. Various 
percentage values were computed from the above distributions and are pre- 
sented in Table I. The percentage values were also obtained by numerical 
integrations for n = 5, 7, 10, 15, 20, 25, 30. Values for other values of n were 


obtained by interpolation. These percentage values can be obtained by a double 
quadrature since 


G(R») = [ e [ g(r, x, v) dv dz dryy = 


1 — a(n — 1) [ : [ [ ( yg nO at) fla)flz — v) do dx dry. 


This integral was evaluated for all combinations of the values of n indicated 
above and for Ris = 0, .06, .10, .16, .21, .26, .30, .34, .40, .44, .48, .53, .56, .60, 
.80, .90. These values are not regularly spaced since several computations were 
made before it was possible to select the particular values of R which would be 
most useful for evaluating G(R»). The values of the integral in (13) were used 
as the base for computations for all the tables included in this paper. 

4. Distribution of other ratios. It can be suggested that a ratio to test whether 
z, is significantly far from z,, should avoid z,. Let us consider rn = 
(tn — 2n-1)/(2n — 22). Its cdf is 


a [ff am [moa ( eo as) fle — v)f(a) dv de. 


For the rectangular distribution we obtain the density function 
(15) (n — 3)(1 — ru)"™. 


For the rectangular distribution we can write down the density function of 
Tika = (Ln — In-1)/(Xn — Te) a8 


(16) (n — k — 1)(1 — ria)”, 
where k = 0,1, --- ,n — 2. 


(13) 





_ EXTREME VALUES 71 


n = 4, Normal population. When we assume the normal distribution for our 
f(x) and consider k = 2, the first sample size of interest isn = 4, here rn = 
(t%4 — 23)/(xs — 2). The density function may be obtained for this ratio by 
the procedures used above for ri. The helpful substitution here is ry = 
(\/2/2 + Vw — 1)~™”. The resulting expression is 


: eee 8 eee 
(17) (rir) Waseh 1+ V3 — 4r + 3)" 
and the cdf is 


6 1 1 
(18) = E tan V3 (2R — 1) + are tan > (4 -- 48 + aR)" — 2. 


If we now set this function equal to 1 — a, we may solve for the various per- 
centage values for this distribution. 

n = 5, Normal population. The distribution of the similar ratio for samples 
of size five, rx, = (45 — 24)/(2%s — 22) is integrable into an expression similar to 
the distribution of rio for n = 5. The percentage values for the distribution of 
Ty for n = 4,---, 30 are in Table II. The distribution of r, for samples of 
size 5 is 


3 7 - Bo sy =I E 


where the symbols in this expression and those to follow are 
am VS, B= Oi 8 = Gr 2/n 
qm = V4 —4r + 3r, B’ = (2+1)/n, = (3 — 2r)/m, 
=V3—4+4F, y=U-2)/m, r=A+7r/u, 
@=V3—2+3rF, vY=(1+2r)/a, 1 =83—7r)/g, 
n” = (3r — 1)/gs. 
The percentage values of the distribution of the ratio ry. = (x, — 2)/(%, — 2) 
are in Table III. The general expression for the edf is 


‘ [ ea (f= fo) at) ("90 ae) f(z — v) f(a) dv dx. 


The smallest sample size for which this ratio will have meaning is n = 5. The 
density function for n = 5 is 
+ Bear Bo +8 | 


Ti a fl V3 


Percentage values have been computed in a similar manner for ra = 
(tn — Ln—2)/(Xn — Xr), Tn = (Ln — Tn-2)/(Ln — Te), Tr = (Ln — Ln-2)/(Tn — Xs) 


Qa 


2 


T 
[: + tan” 


LEELA LEELA LLG GENS LLLP IE IER BE A AA IES 





ee Nee aha tere en 


g 
; 
t 
: 


72 W. J. DIXON 


and are presented in Tables IV, V, and VI. Here again analytic expressions 
can be obtained for the distribution of a particular ratio for small values of n. 
We have the distribution of ra for n = 4 since for this sample size rx + rio = 1 
if we consider rp = —2— =. 
oo oe 
For n = 5 the density function of ra is 


a 7g (a0 7g + tan Ze) 


pw -l 
+5 (tan 


For n = 5 the density function of ra is 


—B - 6 -1 75) ol a 
a las (tan /5 + tan /5 /3 


The distribution for the ratio r;i1 = (%, — 2;)/(%, — 2%) is 


r i 6 yee pg ( [0 at) fiz - ») 
‘ Ce f ares f(x — rv) f(x) UU. tf a) dv dz. 


5. Final remarks. 

Accuracy of tables. The goal with respect to accuracy was to obtain three 
places of accuracy in the percentage values. It is believed that the values in 
Tables I, II, III are in error by not more than one or two in the third place, 
while the values in Tables IV, V, and VI are believed to be accurate to within 
three or four units in the third place. 

5.2. Investigation of the performance of the ratios. It is important to know 
something about the performance of these ratios for various purposes. Refer- 
ence is made to another paper [1] evaluating the performance of these criteria 


as well as a number of others. 
REFERENCE 


[1] W. J. Dixon, “‘Analysis of extreme values,’’? Annals of Math. Stat., Vol. 21 (1950), pp. 
488-506. 





EXTREME VALUES 


TABLE I 
Prin > R) =a 





01 02 0S 10 20 30 40 50 





-988 .976 . -886 .781 .684 .591 .500 . 
-889 .846 . ‘ -560 . .394 .324 . 
-780 .729 . é ‘ . -308 .250 .1f 


-644 . 482 . é 261 . 
586 . ‘ ‘ ; -230 .184 . 
-543 . ‘ ‘ ‘ .208 .166 . 
483 . 

-460 . 

441 . 

425 . 

.399 . 


cee all 


.363 . 
356 . 





344 . 
333 4 


b> t & bo to 
oe OO RS 


SIRF KRESS 


| 
. 





TABLE II 
Pr(ru > R) =a 
00S 01 -02 05 10 20 30 4 50 60 .70 80 90 495 Yi 


-995 .991 .981 .955 .910 .822 .737 .648 .554 . -362 .250 . 069 
-937 . -876 .807 .728 .615 .524 .444 .369 . -224 .151 .07 


-839 .805 . -689 .609 .502 .420 .350 .288 . ‘ 113 . -028 
782 . -689 . -530 .432 .359 .298 .241 . -140 .093 . .022 
725 . : 554 . 385 . ; -210 . -121 .079 . 

.677 . : 512 . 352 . ‘ -189 . 

-639 . : J -409 .325 . ‘ : ‘ .098 . 


.580 . : d -367 .289 . : ‘ ‘ .084 . 
539 . 
522 . 


508 . 
495 . 
484 . 
473 . 











009 | 
.009 

: .009 

: .009 

; .008 

$ .008 

a .008 

: 008 

i 008 

: ee! 





EXTREME VALUES 


TABLE III 
Pritz > R) =a 


2 320 40 


> 


-669 . 
-465 . 
374 . 
-317 . 
-270 . 
258 . 


— 
~ 
~ 


S 
an 


232 . 
217 . 
-204 . 
-193 . 
184 . 


S877 ix 
-170 . 
-163 . 
157 . 
153 . 


dibbellbhebtebelibebstababs 
SEEEE ERERE BEEEE: 


= 


145 . 





| $8888 RLBNS 





W. J. DIXON 


TABLE IV 
Pr(reaa > R) =a 








01 02 0S . -20 0 A .50 


-992 .987 .967 . d 807 . -676 . 
.929 .901 . d -694 . é .500 . 





-836 .800 .7% 


732 . q ° d ‘ 355 . 
-670 . 


-667 .627 . ‘ ‘ ‘ ‘ .288 . 
592 . ‘ : ; 4 -268 . 
-603 .564 . 


-520 . 





EXTREME VALUES 


TABLE V 
Pri(ra > R) =a 





W. J. DIXON 


TABLE VI 
Pr(req > R) =a 


s\e 05 0 22 25 010 WW 8 40 30 & 20 20 90 39 f 


-998 . -992 . n -930 .880 . -780 .720 .640 . 
-970 . 919 . : -780 .730 . -610 .540 .470 . 
922 . -857 . -745 .664 .602 . -490 .434 .375 . 
873 . -800 . . -592 .530 . .425 .373 . 
-826 . ‘ j ‘ -543 .483 . ‘ 335 . 


-703 . “ -503 .446 . ‘ -305 . 
-661 . . -470 .416 . , 282 . 
-628 . ' -443 .391 . 
-602 . ° -421 .370 . 

-402 .353 . 


-542 . ‘ .3873 .325 . 
527 . ‘ -361 . 


-502 . ‘ -340 .295 . 


‘ 287 . 
481 . ‘ -323 .280 . 
472 . ‘ -316 .274 . 
484 . 





SBRNRF KRESS 





SBSRBNS REESE 


A een ee et et ee oe ee 





ON INFORMATION AND SUFFICIENCY 


By S. Ku.Lupack anp R. A. LEIBLER 


The George Washington University and Washington, D. C. 


1. Introduction. This note generalizes to the abstract case Shannon’s definition 
of information [15], [16]. Wiener’s information (p. 75 of [18]) is essentially the 
same as Shannon’s although their motivation was different (cf. footnote 1, p. 95 
of [16]) and Shannon apparently has investigated the concept more completely. 
R. A. Fisher’s definition of information (intrinsic accuracy) is well known (p. 709 
of (6]). However, his concept is quite different from that of Shannon and Wiener, 
and hence ours, although the two are not unrelated as is shown in paragraph 2. 

R. A. Fisher, in his original introduction of the criterion of sufficiency, re- 
quired “that the statistic chosen should summarize the whole of the, relevant 
information supplied by the sample,” (p. 316 of (5]). Halmos and Savage in a 
recent paper, one of the main results of which is a generalization of the well 
known Fisher-Neyman theorem on sufficient statistics to the abstract case, 
conclude, ‘‘We think that confusion has from time to time been thrown pn the 
subject by ... , and (c) the assumption that a sufficient statistic contains all 
the information in only the technical sense of ‘information’ as measured by 
variance,” (p. 241 of [8]). It is shown in this note that the information in a 
sample as defined herein, that is, in the Shannon-Wiener sense cannot be in- 
creased by any statistical operations and is invariant (not decreased) if and 
only if sufficient statistics are employed. For a similar property of Fisher’s 
information see p. 717 of [6], Doob [19]. 

We are also concerned with the statistical problem of discrimination ((3], [17]), 
by considering a measure of the “distance” or “divergence’’ between statistical 
populations ((1], [2], [13}) in terms of our measure of information. For the sta- 
tistician two populations differ more or less according as to how difficult it is to 
discriminate between them with the best test [14]. The particular measure of 
divergence we use has been considered by Jeffreys ({10], [11]) in another connec- 
tion. He is primarily concerned with its use in providing an invariant density 
of a priori probability. A special case of this divergence is Mahalanobis’ gen- 
eralized distance [13]. 

We shall use the notation of Halmos and Savage [8] and that of [7]. 

2. Information. Assume given the probability spaces (X, S, ui), i = 1, 2, 
such that 4, = yw! (cf. p. 228 of [8]) and let \ be a probability measure such that 
A = {u, me} (e.g., A may be wi, or we: Or 4(u: + ye), etc.). By the Radon-Nikodym 
theorem [7] there exist f;(z), i = 1, 2, unique up to sets of measure zero in \X, 


1 Tf wi(Z) ¥ 0, w2(Z)} = 0 or wi(EZ) = 0, w2(Z) ¥ 0 for E + S then we can discriminate per- 
fectly between the populations. The assumption yw: = yw: that is, that uw: and pu: are absolutely 
continuous with respect to each other is made to avoid this situation. 


79 


PR NGO) wet en ca te = laps terete. 


er NEI 





80 S. KULLBACK AND R. A, LEIBLER 
measurable \ with 0 < fi(x) < @ [A], 7 = 1, 2, such that 
(2.1) w(E) = ff f(z) ance), 


for all E « S. If H;, i = 1, 2, is the hypothesis that x was selected from the 
population whose eo, measure is u;,7 = 1, 2 then we define 


(2.2) log we 


as the information” in x for discrimination between H, and H,. The mean in- 
formation for discrimination between H, and H; per observation from E « S 
for yw: is given by (cf. pp. 18, 19 of [16]; p. 76 of [18]) 


fi(z) fi(z) 
Iis(B) = ae | dune) log 2 = ae | fle) log HS ante) 


(2.3) for m(E) > 0, 
=0 for mw(Z) = 


It should be noted that J;..(Z) in (2.3) is well defined in that the integral 
in its definition always exists even though it may be +, since the measures 
are finite measures.’ It is shown in Lemma 3.2 that 


Ti2(E) 2 log m(E)/m(E) for (EZ) > 0. 


We shall denote by 7(1:2) the mean information for discrimination between 
Hi, and H, per observation from jw ; i.e.,* 


fi(z) 
fo(x) 


= [ a) lon dd(z). 


(1:2) = Tya(X) = f dua(z) log 
(2.4) 


2 It follows from Bayes’ Theorem: [12] that 


fi(z) at P(H; | 2) 


log ia)” © Piy| 2) 


— log = [I 
a 
where a; ,i = 1, 2, are the a priori probabilities and P(H;| x), i = 1, 2, the a posteriori 
probabilities of H; ,i = 1, 2, respectively. 
* We are indebted to a referee for this remark as well as for the following example 


which shows that the assumptions at the beginning of this paragraph do not imply finite- 
ness of information. Take E = (0, 1), w: = Lebesgue measure, f2(xz)/f:(z) = ke~'’*, where 


1 
ko = [ e~"/* dt. It is easily verified that I(1:2) is infinite (cf. also p. 187 [9}). 
0 


‘We shall omit the region of integration when it is the entire space. 





INFORMATION AND SUFFICIENCY 


J(E) - Ty2(E) + T2:(E) 


+e fi(x) De es f.(x) 
= ce J, da) low + ay | dale) low 


fiz) — f(x) f(z) 
= | (em — BO) tog 2 anc. 


We denote by J(1, 2) the “divergence” between y, and ye (cf. p. 158 of [11]) so 
that! 


fi(z) 
aa 

Shannon ((15], [16]) defined information on a finite discrete space and we note 
that J;,.(Z) defined in (2.3) is precisely the generalization of that information 
which is obtained when one replaces the finite space by S M E, the measure of 
equidistribution by y:/m(H) and the measure whose information is being de- 
fined by w:/:(E). Just as Shannon observed that certain theorems were carried 
over to the Lebesgue case, we shall see here that they maybe formally carried 
over to the general case. 

For the parametric case in which fi(z) = f(x, 6) and fo(x) = f(z, @ + Aé), 
where @ and @ + Aé@ are neighboring points in the k-dimensional parameter 
space, with suitable assumptions on the density function (e.g., see p. 774 of 
[4]), to within second order terms it is found that 


(26) J, 2) = Ju(X) = fe) — fe) log 8 


(2.7) I(0; 6 + A@) = $2gapA0, AO , a,8 = 1,---,k, 
(2.8) J (8, 6+ Aé) = 2a phOaAOs 6 a, B= 1, tre, k, 


where 


(2.9) we = [1G ZX) FZ) a 


are the elements of Fisher’s information matrix (cf. par. 3.9 of [11}). 
When » and yw. are multivariate normal populations with a common matrix 
of variances and covariances then 


(2.10) J(1, 2) = Ds.5;0°", Qa, 8 = 1, rare k, 


where 5. , a = 1, --- , k, are the differences of the respective population means 
and o**, a, 8 = 1, --- , k, are the elements of the inverse of the common matrix 


’ We are indebted to a referee for the comments with respect to Shannon’s definition 
as well as for the comment that this should be of interest to anyone who has puzzled over 
Wiener’s statement that his definition of ‘information’? can be used to replace Fisher’s 
definition in the technique of statistics (p. 76 of [18]). 


: 
3 
‘ 
; 
: 
i 
5 
3 
, 





62 DLE RE EDO 


82 S. KULLBACK AND R. A. LEIBLER 


of variances and covariances; i.e., J(1, 2) in (2.10) is k times Mahalanobis’ 
generalized distance [13]. 

3. Some properties of information. 

Lemma 3.1. [(1:2) is almost positive definite; i.e., I(1:2) = 0 with equality if 
and only if fi(z) = fo(x) (N. 

Proor.’ Let g(x) = fi(x)/fo(x). Then 


1(1:2) = / falx)g(x) log g(x) da(z) 
(3.1) 


= [ 9(2) log o(2) dun(2). 
If we write g(t) = ¢ log t, then since 0 < g(x) < [A] and 
(3.2) J 92) uate) = [ice) ane) = 1, 


we may write 


(3.3) 9(g(x)) = e(1) + [9(z) — Ue'(1) + Ho) — e"(h@)) A, 


where h(x) lies between g(x) and 1 so that 0 < A(x) < @ [AJ. 
Therefore 


(3.4) [ eCo(a)) dusla) = 5 f tole) — Ue") dun), 
where ¢”(t) = ; > 0 for t > 0. It therefore follows from (3.4) that 


(3.5) J o(2) log g(x) dun(z) = 0 


with equality if and only if g(x) = 1 [A). 
Lemma 3.2. 


Nh;(E) 2 log for ME) > 0, 


Proor. If I;..(£) = ©, the result is trivial. For finite J,..(£) apply Lemma 
3.1 to 


2 ui(E) i dy, (zx) f(z) /m(E) 
Tux(E) — log 1B) ~ Je ust) 8 f,0)/ua(B) 


THEOREM 3.1. [(1:2) is additive for independent random events’ ; 1.e., 
Ty(1:2) = [,(1:2) + J,(1:2). 
* This is essentially the proof on p. 151 of [9]. 
7 Shannon (p. 21 of [16]) and Wiener (p. 77 of [18]) prove similar results. This is clearly 


a fundamental property which information must possess, and is one of the a priori require- 
ments set down by Shannon in arriving at his definition. 





INFORMATION AND SUFFICIENCY 


Proor. 


Ig(t:2) = f fiz, v) log SEY ante, v 


- [fore toe ee dna(2) daly) 


f"@) f?) 
= [1°@) log Tae Mae) + f APG) log Faye aatu) 


= I,(1:2) + Iy(1:2). 


4. Transformations and invariance of I(1:2). Consider the measurable trans- 
formation T of the probability spaces (X, S, u;:) onto the probability spaces 
(Y, T, »;) and suppose for G ¢ T, »(G) = u(TG), i = 1,2. Then » = » = y, 
where y = X77". We define 


- s qty) _ 1 gly) 
41) Til) = Hef dnivdtiog MB = ex [ oxty) to, 2 ayy, 


42) S: ae i (ao - 4), ory) 


g(y) 


n(@) —»(@) ge(y)’ 


where g;(y) is defined by 


(4.3) (G) = [ gy) dy(y), 


for all G ¢ T. 

Turorem 4.1. J(1:2) = I'(1:2), with equality if and only if T is a sufficient 
statistic. 

Proor. If 7(1:2) = © the result is trivial. By Lemma 3 of Halmos and 
Savage [8] 


(16 ws aT (zx) 
(4.4) I'(1:2) / d(2) log Be. 
Then 


sees filz) iT (x) 
11:2) — 10:2) = f daz) oe fla ~ ara | 


ht filx) goT (x) 
= | He) oe aera) OO) 


(4.5) 





a ee ee ee aR a ea 


84 S. KULLBACK AND R. A. LEIBLER 


fiz) g2 T(z) T(z) , 
frlz)gu f,(a)g:T(@)’ 


1(1:2) — 11:2) = f pe gta log g(z) dd(z) 


- [ o@) log g(x) dus2(x), 


f2(z)gqr T(x) 
TG) dx(z) 


If we set g(x) = en 


(4.6) 


where yi(E) = forall EeS. 


Since 


— [A@gT@) f@)maT@) 

[ 9@) dur2(x) | tres ~ ga (a) dX(z) = 

the method of Lemma 3.1 leads to the conclusion that 7(1:2) — I'(1:2) 2 0 
with equality if and only if 

f(z) _ mT) 


(4.7) fla) gsT(@) 


{A}. 
But (4.7) implies that 


fi (x) 

(4.8) fla) (e)T™(T) [Al, 
which is by Corollary 2 of Halmos and Savage [8] necessary and sufficient that 
the statistic T be sufficient for a homogeneous set of measures on S. If T' is 
sufficient then by the same proof* as Theorem 1 of Halmos and Savage [8] fi(z) 
and f(x) are (e)T'(T){A|. Then by Lemma 2 of Halmos and Savage [8] and 
the definition of g; and gz, fi(x) = gi:T (x) [A), fo(x) = geT (x) [A\ and the result 
in (4.7) follows. 

Corouuary 4.1. [(1:2) = 1'(1:2) if T is non-singular. 

Proor. If T is non-singular, T~'(T) is S and therefore f,(xz)(e)T~'(T), i = 1, 2. 
The result then follows from Theorem 4.1. 

THeoreM 4.2.9 [,..(7"G) = Ij:2(G) for all G ¢ T if and only if 


(1:2) = I’(1:2). 
PROOF. 


iG) = [ @ jog 2 i ihe diy) 19 2) 


ao u(G@) ~~ galy) nG) © gly) 
dy (z) qT (zx) 
- - ae 
J xer0l0 m(T*G@) © g:T(z) 
- / dui(x) | og 22 (z) 
r-t@ ui(T G) g2T (zx) ; 
Application of the method of Theorem 4.1 completes the proof. 


5 Note that the A in Theorem 1 of [8] is different from the \ here. However, as remarked 
by a referee, the same proof will suffice. 
* We are indebted to a referee for calling this to our attention. 





INFORMATION AMD SUFFICIENCY 85 


5. Properties of J(1, 2). For each of the results in paragraphs 3 and 4 there 
can be stated an identical one for J(1,2). This follows from its definition in 
(2.5) and (2.6). Also it should bo noted that J(i,2) is symmetric with respect 
to uw, and yw and independent of the a priori probabilities. Jeffreys (par. 3.9 of 
{11]) mentioned the symmetry, positive definiteness and additivity, and in- 
variance for non-singular transformations. 

6. Application. Two indications of simple application of these concepts may 
be useful. 

(1). Consider the problem of testing an hypothesis presented by Lehmann 
(p. 2 of (20}). Let the subscript 1 refer to Lehmann’s hypothesis H, the sub- 
script 2 refer to any of the alternatives, F = {—2,2},G@ = {0}; then 


1 fa a a a 


Ius(@) = + «log 5—*. 


It may be readily verified that J;..(G) < Ji.(F) and therefore G i.e. {0} should 
be used as the critical region. 

(2). Suppose it is necessary to decide whether a sample of n observations 
has been drawn from the multinomial population {p,, po.,---, pe} or 


( 
FE’ = tee ay Because of certain limitations the test must be made under 


the following conditions: 
a) Sequential analysis cannot be used. 


b) The observations must be grouped into two mutually exclusive categories. | 


If it is assumed that p, 2 p. 2 --- = pe, then the most effective grouping 
is such that 


r ns 


> x Pi 2 k- zy? Pi 
6.2 i- — _— | ior. 

(6.2) ts ( Ps * ) log r/k + a P -*) ar (k — r)/k 
is a maximum. The efficiency of the grouped test is measured by 


(6.3) J'/J, 


where 


(6.4) y=%(p-} i) toe ee 


tol 


in the sense that n observations of the grouped test will provide as much in- 
formation as N observations of the ungrouped test where 


(6.5) nJ' = NJ. 


For example if p; = .5, ps = 3, ps = .1, p, = .1, then using logarithms to base 
10, J’ for r = 1, 2,3, 4, becomes respectively 


(6.6) 1193, .0903, .0716, 0.0, 


ne aa eet ene Tea 





wah asd bleh ig ek ea 


Bp pe ee Ss 


é 
t 
j 
i 
t 
: 
: 
a 
: 
; 


86 S. KULLBACK AND R. A. LEIBLER 


and in this case J is 0.1986. The most effective grouping is therefore 


(pi), (P2 + ps + pa) and the grouped case is ame 


the ungrouped test; i.e., there is a loss of 40% because of the grouping. 
REFERENCES 


[1] A. Boatracuaryya, “On a measure of divergence between two statistical populations 
defined by their probability distributions,’ Bull. Calcutta Math. Soc., Vol. 35 
(1943), pp. 99-109. 

[2] A. Baatracuaryyra, “On a measure of divergence between two multinomial popula- 
tions,” Sankhyd, Vol. 7 (1946), pp. 401-406. 

[3] G. W. Brown, ‘‘Basic principles for construction and application of discriminators,” 
Jour. Clinical Psych., Vol. 6 (1950), pp. 58-61. 

[4] J. L. Doos, “Probability and statistics,” Trans. Am. Math. Soc., Vol. 36 (1934), pp. 
759-775. 

[5] R. A. Fisuer, ‘‘On the mathematical foundations of theoretical statistics,’’ Philos. 
Trans. Roy. Soc. London, Ser. A, Vol. 222 (1921), pp. 309-368. 

[6] R. A. Fisuer, ‘Theory of statistical estimation,” Proc. Cambridge Philos. Soc., Vol. 
22 (1925), pp. 700-725. 

[7] P. R. Hatmos, Measure Theory, D. Van Nostrand, 1950. 

[8] P. R. Haumos ano L. J. Savaae, “‘Application of the Radon-Nikodym theorem to the 
theory of sufficient statistics,’’ Annals of Math. Stat., Vol. 20 (1949), pp. 225-241. 

[9] G. H. Harpy, J. E. Lirrtewoop, anv G. Pé ya, Inequalities, Cambridge University 
Press, 1934. 

[10] H. Jerrreys, “An invariant form for the prior probability in estimation problems,” 
Proc. Roy. Soc. London, Ser. A., Vol. 186 (1946), pp. 453-461. 

(11) H. Jerrreys, Theory of Probability, 2nd ed., Oxford, 1948. 

[12] A. Kotmocororr, Grundbegriffe der Wahrscheinlichkeitsrechnung, Ergebnisse der 
Mathematik und ihrer Grenzgebiete, Julius Springer, Berlin, 1933. 

[13] P. C. Manatanosts, “On the generalized distance in statistics,” Proc. Nat. Inst. of 
Sciences of India, Vol. 2 (1) (1936). 

[14) E. Mourter, “Etude du choix entre deux lois de probabilité,” C. R. Acad. Sci. Paris, 
Vol. 223 (1946), pp. 712-714. 

[15] C. E. SHannon, ‘A mathematical theory of communication,” Bell System Technical 
Journal, Vol. 27 (1948), pp. 379-423; pp. 623-656. 

[16] C. E. SHANNON AND W. Weaver, 7'he Mathematical Theory of Communication, Univer- 
sity of Illinois Press, Urbana, 1949. 

[17] B. L. Wexcn, ‘‘Note on discriminant functions,’’ Biometrika, Vol. 31 (1939), pp. 218-219. 

[18] N. Wiener, Cybernetics, John Wiley and Sons, 1948. 

[19] J. L. Doon, ‘Statistical estimation,’ Trans. Am. Math. Soc., Vol. 39 (1936), pp. 410-421. 

[20] E. L. Lenmann, ‘‘Some principles of the theory of testing hypotheses,’ Annals of 
Math. Stat., Vol. 21 (1950), pp. 1-26. 


= .6007 times as efficient as 





ON THE FUNDAMENTAL LEMMA OF NEYMAN AND PEARSON! 
By Grorce B. Danrzic aNpD AspranamM Wap’ 
Department of the Air Force and Columbia University 


1. Summary and introduction. The following lemma proved by Neyman and 
Pearson [1] is basic in the theory of testing statistical hypotheses: 
Lemma. Let fi(x), «++ , fmsi(x) be m + 1 Borel measurable functions defined 


over a finite dimensional Euclidean space R such that [ \f(z)|dzx < @ 


(¢ = 1,---,m + 1). Let, furthermore, c, , --+ , Cm be m given consiants and § 
the class of all Borel measurable subsets S of R for which 


(1.1) [s@ dz = (i = 1, «++, m). 


Let, finally, So be the subclass of $ consisting of all members Sy of & for which 


2) [ fass(z) dr = [ fars(z) de for lhl S tnt 


If S is a member of & and if there exist m constants k; , --+ , km such that 
(1.3) Sm4i(z) = kafi(z) +--+ + Kmfm(z) when zeS, 
(1.4) Smai(z) S kifi(z) + +++ + Knfin(x) when «eS, 


then S is a member of So . 

The above lemma gives merely a sufficient condition for a member S of §$ to 
be also a member of So. Two important questions were left open by Neyman 
and Pearson: (1) the question of existence, that is, the question whether So is 
non-empty whenever § is non-empty; (2) the question of necessity of their 
sufficient condition (apart from the obvious weakening that (1.3) and (1.4) 
may be violated on a set of measure zero). 

The purpose of the present note is to answer the above two questions. It will 
be shown in Section 2 that S» is not empty whenever § is not empty. In Section 
3, a necessary and sufficient condition is given for a member of $ to be also a 
member of S. This necessary and sufficient condition coincides with the Ney- 
man-Pearson sufficient condition under a mild restriction. 

2. Proof that S) is not empty whenever $ is not empty. Each function f,(z) 
determines a finite measure yu; given by the equation 


(2.1) u(S) = [ fda) dz 4 = 1,2,--+, m+ D). 


1 The main results of this paper were obtained by the authors independently of each 
other using entirely different methods. 
* Research under contract with the Office of Naval Research. 


87 





roererery 


Del as Sk oa Sei donk Sg Pi de 


} 
: 
. 
| 


88 GEORGE B. DANTZIG AND ABRAHAM WALD 


Let » be the vector measure with the components py; , --- , #m4: ; i.e., for any 
measurable set S the value of u(S) is the vector (u:(S), --- , #m4i(S)). Thus, for 
each S the value of »(S) can be represented by a point in the m + 1-dimensional 
Euclidean space E. A point g = (9: , «~~ , gm4:) Of E is said to belong to the range 
of the vector measure u if and only if there exists a measurable subset S of R 
such that u(S) = g. 

It was proved by Lyapunov [2] (see also [4]) that the range M of yu 
is a bounded, closed and convex subset of EZ. Let L be the line in E which is 
parallel to the (m + 1)-th axis and goes through the point (c: , c2,--- , Cm, 0). 
Suppose that $ is not empty. Then the intersection M* of L with M is not empty. 
Because of Lyapunov’s theorem, M* is a finite closed interval (which may 
reduce to a single point). There exists a subset S of R such that u(S) is equal 
to the upper end point of M*. Clearly, S is a member of So . 

3. Necessary and sufficient condition thatamember ofS be also a member of Sp. 
Let v(S) be the vector measure with the components 4,(S), --- um(S). Accord- 
ing to the aforementioned theorem of Lyapunov, the range N of » is a bounded, 
closed and convex subset of the m-dimensional Euclidean space. 

By the dimension of a convex subset Q of a finite dimensional Euclidean space 
we shall mean the dimension of the smallest dimensional hyperplane that con- 
tains Q. A point g of a convex set Q is said to be an interior point of Q if there 
exists a sphere V with center at g and positive radius such that V NM 11 C Q, 
where II is the smallest dimensional hyperplane containing Q. Any point q that 
is not an interior point of Q will be called a boundary point. We shall now prove 
the following theorem. 

THEOREM 3.1. If (c:, --- , Cm) t8 an interior point of N, then a necessary and 
sufficient condition for a member S of § to be a member of So is that there exist m 
constants k, , «++ , km such that (1.3) and (1.4) hold for all x except perhaps on a 
set of measure zero. 

Proor. The Neyman-Pearson lemma cited in Section 1 states that our condi- 
tion is sufficient. Thus, we merely have to prove the necessity of our condition. 
Assume that (c:, --- , Cm) is an interior point of N. Let c* be the largest value 
for which (c; , --- , Cm, c*) ¢ M, and c** the smallest value for which 


(1,°*° siCm yO): 4 a. 


We shall first consider the case when c* = c**. Let (@, , --+ , én) be any other 
interior point of N. We shall show that there exists exactly one real value é such 
that (@,,---, én, @) « M. For suppose that there are two different values ¢* 
and é** such that both (@ , --- , én, @*) and (4, --- , é, , @**) are in M. Since 
(c,,-+*, Cm) and (@,---, &») are interior points of N, there exists a point 
(c:, +++, ¢m) in N such that (¢, --- , Cm) lies in the interior of the segment 
determined by (c;, --- , ¢m) and (&,--- , ém). There exists a real value c’ such 
that (c;,---, Gm, c’) « M. Consider the convex set 7 determined by the 3 
points: (@,--- , @m, &*), (&,--:, Gm, o**) and (c,,---., Cm, ¢’). Obviously, 
T C M. But T contains points (c¢; ,.--- , em, h) and (q,---, ¢m, h’) with 





FUNDAMENTAL LEMMA 89 


h # h’, contrary to our assumption that c* = c**. Thus, for any interior point 
(G1, -+* , &m) of N there exists exactly one real value @ such that (@, , --- , 2m , 2) € 
M. Since M is closed and convex, this remains true also when (@,,--- , é,) is 
a boundary point of N. Thus, there exists a single valued function ¢(g; , --- , gm) 
such that gms: = o(g:,°°- , gm) holds for all points g = (g,,--+ , gm, Gmi) 
M. Since M is convex, ¢ must be linear; i.e., o(g1,--+, gm) = 2, kgi +ke. 


t=] 


Since the origin is obviously contained in M, we have ky = 0. Thus, we have 
Jan = = kg; for all points g in M. But then f,.4:(2) = ie. k f(z) must hold 
i=l i=l 


for all x, except perhaps on a set of measure zero. Thus, for any subset S of R, 
the inequalities (1.3) and (1.4) are fulfilled for all x, except perhaps on a set of 
measure zero. This completes the proof of our theorem in the case when c* = c**. 

We shall now consider the case when c** < c*. Let ¢ be any value between 
c** and c*; i.e., c** < c¢ < c*. We shall show that (c, , --+ , cm, c) is an interior 
point of M. For this purpose, consider a finite set of points c’ = (cj), --- , cn) 
in N(i = 1, --- , n) such that c', --- , c” are linearly independent, the simplex 
determined by c', --- , c” has the same dimension as N and contains the point 
(c:, +++, Cm) in its interior. Such points c’ in N obviously exist. There exist 
real values h,(i = 1,--- , n) such that (ci, ---,ch,h) € M (Gi = 1,---,n). 
Let T be the smallest convex set containing the points (cj,---, ca, hi 
(¢ = 1,--+,n), (1, ++: , em, c*) and (q,, --- , cm, c**). Clearly, the dimension 
of T is the same as that of M and (c,, :-- , cm, c) is an interior point of T. 
Thus, (¢:, +++ , ¢m, ¢) is an interior point of M. The point (c, , --- , cn, c*) is 
obviously a boundary point of M. Let g = (g:, --* , gm41) be the generic desig- 
nation of a point in the m + 1-dimensional Euclidean space E. Since 
(c: , *** , Cm, ¢*) is a boundary point of M, there exists an m-dimensional hyper- 
plane II through (c¢, , --- , cm, c*) such that II contains only boundary points 
of M and M lies entirely on one side of I1.* Let the equation of II be given by 


(3.1) Km+i 9ms1 — 2 kigs = kmyc* — z kiej. 
tas t—1 


Since II contains only boundary points of M, and since (¢ , --- , Cm, c) is nota 
boundary point when c** < c < c*, the hyperplane M1 cannoi be parallel to the 
(m + 1)-th coordinate axis; i.e., kn41 # 0. We can assume without loss of gen- 
erality that kai: = 1. Since M lies entirely on one side of II, and 
since for (9: , --* , gm ,@Qmsi) = (C1, °** , Cm, C**) the left hand member of (3.1) 
is smaller than the right hand member, we must have 


(3.2) Jm+i — > kg: Sc* — Pi ke; 
t=] 


i=l 


for all g « M. Let S be a subset of R such that 


* This follows from well known results on convex bodies. See, for example, [3], p. 0. 





PB OE CRIA TRO BLA PEELED TIN, 


90 GEORGE B. DANTZIG AND ABRAHAM WALD 


(3.3) (ui(S), oe Um(S), Hm+1(S)) a (er 9°*** ym, c*). 


It can easily be seen that (3.2) and (3.3) can be fulfilled simultaneously only 
if S satisfies the conditions (1.3) and (1.4) for all z, except perhaps on a set of 
measure zero. This completes the proof of our theorem. 

It remains to investigate the case when (c; ,--- , Cm) is a boundary point of 
N. For this purpose, we shall introduce some definitions and prove some lemmas. 

Let — = (&, --- , &m) be an m-dimensional vector with real valued components 
at least one of which is not zero. We shall say that ¢ is maximal relative to the 
point c = (, +--+ , Cm) if 


(3.4) a &:gi S a Esc; 
for all points (g: , --- , gm) in NV. 
We shall say that a set {é'}(¢ = 1, 2,---,7r;7r > 1) of vectors is maximal 


relative to the point c = (c,, «+: , ¢m) if the set {£°}(¢ = 1, --» ,r — 1) is maxi- 
mal relative to c, not all components of are zero and 


(3.5) a tig; = 2 EF Cy 
= , 


holds for all points (g, , --- , gm) of N for which 


(3.6) 2X §95 = de, (j= 1,+--,r—1). 


A set of vectors {£'}(¢ = 1,--- , r) is said to be a complete maximal set 
relative toc = (q,,-°--, Cm) if {é'}(¢ = 1, 2,---,r) is maximal relative to c 
and no vector ¢" exists such that £’** is linearly independent of the sequence 
(e', --- , &) and (¢', --- , &, &™) is maximal relative to c. 

Lema 3.1. If c = (a1, --°- , Cm) ts a@ boundary point of N, then there exists a 
positive integer r and a set {t', --- , } of vectors that is a complete maximal set 
relative to c. 

Proor. Since c is a boundary point of NV, there exists an (m — 1)-dimensional 
hyperplane II through ¢ such that N lies entirely on one side of II.’ Let the 
equation of II be given by 


» igi = » E:C;. 


Since N lies entirely on one side of II, either >> égi 2 Dd é.c; for all 
i=l tl 


points (g1, °°: , gm) in N, or py tg: S . &.c; for all (g:,---, gm) in N. We 
t=} t=l 


put ¢ = —£ if Zég; = Lé,c; for all points (g,, --- , gu) in N. Otherwise, we 
put & = &€. Clearly, ¢ is maximal relative to c. If ¢' is not a complete maximal 
set relative to c, there exists a vector ¢ such that & is linearly independent of 





FUNDAMENTAL LEMMA 91 


é' and (é', é’) is maximal relative to c. If (¢', #) is not a complete maximal set, 
we can find a vector # such that & is linearly independent of (¢', £) 
and (¢’, ¢’, ¢*) is a maximal set relative to c, and so on. Continuing this proce- 
dure, we shall arrive at a set (é',--- , &)(r S m) that is a complete maximal 
set relative to c. This completes the proof of Lemma 3.1. 

Lema 3.2. If (t', --- , &”) is a@ maximal set of vectors relative to c = (c,, «-° 


’ 


Cm) and if v(8) = c, then the following two conditions are fulfilled for all x (except 
perhaps on a set of measure zero): 


a) If x is a point in R for which Z. tific) = 0 fori = 1,2,---,u — 1 and 
j=l 


> exf(z) > O (wu = 1,2, --+,7r), thenzeS. 
j=l 
b) If x ts a point of R for which > Ef (x) = 0 for i 
j=l 


ze tf (xr) < 0, thenzeS. 
j=l 


Proor. Assume that (¢', --- , &) is maximal relative to c. Then, é is maxi- 
mal relative to c. This implies that for all z (except perhaps on a set of measure 


zero) the following condition holds: x e S when # tif(x) > 0 and x ¢ S when 
j=1 


> eifj(x) < 0. Thus, conditions (a) and (b) of our lemma must be fulfilled 
j=l 


for u = 1. We shall now show that if (a) and (b) hold for u = 1, --- , v then 
(a) and (b) must hold also for u = v + 1. For this purpose, consider the set 


R’ of all points x for which >> gif,(xz) = 0 fori = 1, --- , v. If R is replaced by 
j=l 


R’, then ¢”*' is maximal relative toc’ = (c,,---, Cn) where ¢; = [ fi(x) dx and 
a 
S’ = SMR’. Hence, for any x in R’ (except perhaps on a set of measure zero) 


the following condition holds: z ¢« S when ze f(z) > 0 and «¢S when 
j=l 


eo" ;(z) < 0. But this implies that (a) and (b) hold for u = v + 1. This 
j=l 


completes the proof of our lemma. 
Lemma 3.3. Let (¢,---, &) be a complete maximal set of vectors relative to 
c = (1, +++, Cm), and let T be the set of all points g = (g,,--- , gm) of N for 


which >. tig; = >. tic; for i = 1,2,---,r. Then T is a bounded, closed and 
j=l j=l 


convex set and c is an interior point of T. 

Proor. Clearly, 7’ is a bounded, closed and convex set. Suppose that c is a 
boundary point of 7’. Then there exists a hyperplane II of dimension m — 1 such 
that II goes through c, II contains only boundary points of T and T lies entirely 
on one side of II’. Let the equation of II be given by 


> £393 = dX Ej;Cj, 





Racecar dha na 


92 GEORGE B. DANTZIG AND ABRAHAM WALD 


where £ is independent of £', --- , &. Since T lies on one side of I, we have either 
Yt; 2 % te for all g = (g1, +++, 9m) in T, or tg; s <x Ec; for all 
9 is ,: Let et &G = 1,---, m) in the latter case, ‘on g* 4 —£; in the 
former case. Then > £5°"9; <= £5*'c; for all g in T. But then (¢', --- , &, ¢*) 


is a maximal set relative to ¢, ‘Seatiney to our assumption that (¢', --- , ”) is 
a complete maximal set. Thus, c must be an interior point of 7’ and our ens is 
proved. 

THEeorem 3.2. If c = (c,, «++ , Cm) 18 @ boundary point of N and if (é', --- , ) 
ts a complete maximal set of vectors relative to c, then a necessary and sufficient 
condition for a member S of & to be a member of So is that there exist m constants 
ky, +++, km such that for all x in R’ (except perhaps on a set of measure zero) 
the inequalities (1.3) and (1.4) hold, where R’ is the set of all points x for which 


> tif,(x) =Q for 1 =1,2,°--,r. 
j=l 


Proor. Suppose that c = (cq, ,-+- ,¢m) is a boundary point of N and that 
(¢', --- , &) is a complete maximal set of vectors relative to c. Let R* be the 
set of all points x for which the following two conditions hold: (1) 


> £3f;(x) 0 for at least one value 7; (2) = jf ;(z) > O where 7 is the smallest 


integer for which . tif,(x) * 0. For any member S of § let S* denote the 
j=l 


intersection of S with R — R’. It follows from Lemma 3.2 that R* — R* | S* 
and S* — R*1 S* are sets of measure zero. Thus 


(3.7) [4 ds = [ 4) dz fee hots epee 
for all S ¢ S. Let 

(3.8) file) =fidx) for xe R’ 

and 

(3.9) fi(c) =0 for reR— R’ 


Let, furthermore, 
(3.10) cC=4- [ fix) dx 
R* 


Let u*, v*, M*, N*, S* and $> have the same meaning with reference to the 
functions f; (x), --- , fasi(z) and the point c* = (cr ,---,cn) as u, v, M, N, 
$ and & have with reference to the functions fi(x), --- , fmsi(z) and the point 

= (1 , ar Cm): 





FUNDAMENTAL LEMMA 93 


It follows from Lemma 3.2 that for any subset S of R for which »(S) is a 
point of the set 7 defined in Lemma 3.3 we have 


[sa az = [fe art ff fe) ae (¢ = 1, ---,m+ 1). 


Since the range of »*(S) is equal to N* even when S is restricted to subsets S 
for which »(S) ¢ 7, the set N* is obtained from the set 7 by a translation. The 
same translation brings the point c = (c¢;, ---,¢m) into c* = (c], --:, Gn). 
It then follows from Lemma 3.3 that c* is an interior point of N*. Application 
of Theorem 3.1 gives the following necessary and sufficient condition for a 
member S of $* to be a member of $9: There exist m constants ki, --- , km 
such that for all xz (except perhaps on a set of measure zero) 


(3.11) fass(z) = kift (x) + -+- + knfa(z) when xeS 


and 

(3.12) faai(z) S knft (x) + +++ + kmfa(x) when x8. 

It follows from (3.8) and (3.9) that (3.11) and (3.12) are equivalent to 
(3.13) fmai(z) 2 kafi(z) + +++ + knfm(z) when ze Sf R’ 


and 
(3.14) fmaa(x) S kifi(z) + --- + kmfm(z) when ze (R — S)NR’. 


Theorem 3.2 follows from this and the fact that every member S of § is a 
member of $* and that a member S of $ is a member of $> if and only if S is 
a member of %. 

It may be of interest to note that if the set R’ is of measure zero, the members 
of $ can differ from each other only by sets of measure zero; i.e., $ consists essen- 
tially of one element. This is an immediate consequence of Lemma 3.2. 


REFERENCES 

{1] J. Neyman anv E. 8. Pearson, “Contributions to the theory of testing statistical hy- 
potheses,’’ Stat. Res. Memoirs, Vol. 1 (1936), pp. 1-37. 

{2} A. Lyapunov, “Sur les fonctions-vecteurs complétement additives,’’ Izvestiya Akad. 
Nauk SSSR. Ser. Mat., Vol. 4 (1940), pp. 465-78. 

[3] T. BonNESEN AND W. FENCHEL, Theorie der Konvexen Kérper, Chelsea Publishing Com- 
pany, New York, 1948. 

[4] P. R. Hatmos, ‘The range of a vector measure,’’ Bull. Am. Math. Soc., Vol. 54 (1948), 
pp. 416-421. 





ESTIMATORS OF THE PROBABILITY OF THE ZERO CLASS IN POISSON 
AND CERTAIN RELATED POPULATIONS 


By N. L. JoHNson 


University College, London 


1. Summary and conclusions. Two estimators of the probability of falling 
into the zero class are compared, for a family of populations related to Poisson 
populations. The first estimator, ¢, , is based on the observed proportion in the 
zero class; the second, « , would be the maximum likelihood estimator if the 
underlying distribution were Poisson. 

From a practical point of view each estimator possesses its own peculiar 
advantages. «, has the advantage that the detailed distribution among the 
non-zero classes need not be examined. « has the advantage that only the 
mean of the observations is needed, the distribution among the various classes 
not being required. The relative importance of these advantages will naturally 
vary according to the situations in which the estimators are to be used. 

An arbitrary measure of relative accuracy, the mean square error ratio, is 
used. On this basis « is superior to « for all sample sizes (greater than one) 
if the population distribution is Poisson. Provided the sample size is not too 
large €, may still be superior to ¢, when the population distribution deviates to a 
moderate extent from Poisson form. 


_ A third estimator «, , which is a modification of « and is unbiased, provided 
the population is Poisson, may be preferred to e unless p exceeds about 0.45. Its 
properties vis-d-vis «, probably differ little from those of e . 


2. The problem. The following investigation was suggested by a problem 
which arose frequently in connection with the study of weapon lethality in the 
course of wartime operational and development research. When a fragmenting 
shell or bomb bursts at a given distance from a target, the density of strikes will 
vary according to the angular direction with regard to the equatorial plane of 
the shell. Within the main fragment belt, however, the density may be regarded 
as varying locally in a random way about an average value. The practical 
requirement is to determine the chance, say g, that at least one potentially 
lethal or effective fragment will strike an area of given size which we may call 
the ‘unit area’. Alternatively we can estimate p = 1 — gq, the chance that no 
such fragment will strike the unit area. 

If it is assumed that the distribution of effective hits follows the Poisson law, 
and in certain cases evidence indicated that this was justifiable, then g = 1 — e&™ 
and p = e ™, where m is the expected value of the number of strikes on the 
unit area. It was therefore customary to estimate m from the observed average 
number of effective hits, i say, per unit area, derived from a series of experi- 
mental firings. Then g was estimated by the formula 1 — e~’. If the distribution 

94 





PROBABILITY OF ZERO CLASS 95 


departs from the Poisson form, the procedure is clearly incorrect in theory, 
but in practice the data were often inadequate to establish any alternative 
form of the distribution law and the estimator 1 — e~* was still used. In the 
discussion below we shall be concerned with the relative accuracy of two alterna- 
tive estimators of p(= 1 — q) (one of the estimators being e), 

(a) when the distribution follows the Poisson law; 

(b) when it departs from this law, but can be represented by a positive or 
negative binomial. 

3. Properties of the two estimators. The problem may be stated formally as 
follows: v;, v2, ***, Ux are independent discrete random variables. If mp be 
the number of zero values out of the n values then 


(1) € = No/n 


may be used as an estimator of p, the probability of the zero class. «, is, in fact, 
the usual form of estimator for the proportion of individuals falling into a given 
class, and is of general application. 

The estimator of p described in section 2 is 


(2) @=e, 


where i = n™' >. »;. This estimator is based on the assumption of a common 
i=1 


Poisson distribution for the v’s. 

It will be noted that, while the evaluation of the estimator « does not requjre a 
knowledge of the values of the separate v’s (provided their total or average is 
known), « requires only a knowledge of the number of v’s which are zero. In 
the case described in section 2, ¢ is often appropriate as the separate values of 
the »’s are not known though their total is known. On the other hand, if, for 
example, v1, v2, *** , Us. represent the number of cells developing in a given 
time in a number of cultures, it may be possible to observe only m , the number 
of cases where no development has occurred. In such cases Fisher [1] has con- 
sidered the inverse problem of estimating m from m by the formula —log « . 
This problem will not be considered in the present paper. 

We shall now compare the estimators ¢«, and « in the case when the v’s do, 
in fact, each follow a Poisson distribution with expected value m, so that 


(3) Priv=r} = = e (r = 0, 1, 2, ---). 


The probability of the zero class is 
(4) p= Pr.jv=0} =e”. 


Since m is a binomial variable with probability p and index n, the moments 
and moment-ratios of ¢; are easily determined. In regard to e , it can be shown 
that 


(5) wale) = pO, 


GL ERE GRITS ORT TIER CS Om oe 





ent ts han ee 


i 
c 
i 
¥ 
Fy 
1 
s 
: 
i 


96 N. L. JOHNSON 


where 
(6) f(s,n) = 1-7", 


¢, is an unbiased estimator of p while « is biased. Numerical calculation shows 
that this bias is negligible for most practical purposes (the maximum absolute 
bias is in the range p = 0.3-0.4 and is approximately +0.18/n). For all values 
of p the relation . 
(7) lim &(e) = p 
holds. 

4. Comparison of the estimators. Since ¢, is a biased estimator of p, the com- 
parison of «¢, and « certainly cannot be based simply on their variances. One 
method of comparison, which does make some allowance for biases, is to use the 


TABLE I 
Ratio of mean square error of «, to mean square error of « (Poisson population) 


.337 .296 . 282 .269 0.256 
475 439 427 416 0.402 
.570 544 535 527 0.516 
644 628 .623 619 0.611 
.704 700 .698 696 0.693 
.756 .762 763 .767 0.766 
.800 .816 822 .829 0.832 
.839 866 875 .886 0.893 
874 911 0.923 0.938 0.948 


mean square errors of the estimators [2]. The mean square error of €2 is G[(€. — p)’] 
= o°(€) + [&(e) — p]’, while the mean square error of « is &[(« — p)'| = o (a) 
since ¢; is an unbiased estimator of p. The ratio of mean square errors will be 
used as an index of comparison of estimators in the present paper, although it is 
clearly arbitrary, and other criteria could be preferable in certain circumstances. 

Table I gives values of the mean square error ratio for various values of n 
and p. According to this criterion the second estimator (€) is more accurate 
than the first (€,) for all cases shown in this table. 

It can be shown that this ratio of mean squares must always be less than one, 
except in the trivial case n = 1. The relative advantage of « increases as p 
diminishes and does not vary greatly with n. 

The correlation between the two estimators is 


(8) (er, 2) = (mp) — py ip — 1) (ph — 14%, 





PROBABILITY OF ZERO CLASS 
whence 
) lim p(e, 2) = {—p(1 — p)~* log p}'. 


p(€., €) approaches this limit rapidly as n increases. We note that 


(10) lim p(€ , 2) = lim (o(¢)/o(«)), 
as is to be expected since ¢, is the maximum likelihood estimator of p [3]. 


5. A third estimator of p. The superiority of ¢, as an estimator of p is to be 
expected, since 3 is a sufficient statistic for p. Using the method described in [4], 
we obtain the minimum variance unbiased estimator’ 


(11) q=(1-—n")”, 


which may be regarded as a modified, and perhaps improved, form of & . 

The variance of « is p'(p-’" — 1). This differs but little from the mean 
square error of & , a8 is to be expected since (1 — n™)" = e*. It appears that 
for sufficiently large values of n the mean square error of ¢; will be slightly less 
than that of « for p < 0.45, while for p > 0.45 the mean square error of «& 
will be slightly the smaller. The performance of «; compared with ¢, will be 
practically identical with that of e . 

6. Non-Poisson populations. It is quite possible tliat ¢, (or ¢;) may be used as 
an estimator of p even when »v is not in fact a Poisson variable. It may be that 
it has been incorrectly assumed that the distribution is Poisson in form or, 
perhaps, departure from Poisson, though admitted, has been considered of 
insufficient magnitude to affect the usefulness of ¢: . 

It is of interest to investigate the effect of deviations from the Poisson distribu- 
tion on the properties of «, and « . In order to do this it is first necessary to 
specify the nature of these deviations. Many forms of modification of the Poisson 
distribution have been suggested ((5]-[9]). We shall deal only with the simple 
form of deviation from Poisson wherein the distribution is defined by successive 
terms in the expansion of 


(12) [1 + w) — wo)”, —1 <w <0or0 <w. 


The expected value of this distribution is m, whatever be the value of w. If 
—1<w <0, then putting w = —P,1+ 0 = Q,NP = mwe have the binomial 
distribution 


(13) Priv =r} = (*)pom, 


! I am indebted to the referee for suggesting the use of this estimator. 





98 N. L. JOHNSON 
If 0 < w we have the negative binomial distribution. Putting # = 20°, m = fo’ 
we have 
a _ Tir+3f) _ (20”)’ 
(14) i= 9G) ~ G+ 


a form of the Pélya-Eggenberger [10] distribution previously obtained by 
Greenwood & Yule [11], which can be considered to arise from a mixture of 
Poisson distributions with expected values distributed proportionately to x’o° 
with f degrees of freedom. As w — 0, the distribution tends to the Poisson form 
whether w is moving through positive or negative values. 

Whether w is positive or negative, the probability of the zero class is 


(15) p= (1+0)"”. 


The moments and moment-ratios of « are the same functions of p as in the 
Poisson case. It can be shown that 


(16) PACH) _ {1 + f(s, <a 


where f(s, n) = 1 — e~“” as in (6), and that the correlation between the two 
estimators is 


ple, &) = (np)*(1 — p)*{[1 + of (1, n)J™" — 1} 
+ {{1 + of 2, n)-"""[1 + of (1, ny" — 17% 


For any value of p, « is stili an unbiased estimator of p, and has the same 


(17) 


variance as when the distribution of v is Poisson. ¢ is still a biased estimator of p, 

but the amount of bias and the variance of « are not the same as when the 

distribution of v is Poisson. Furthermore (7) no longer holds. In fact, putting 
= | in (16) 


&(e) = {1 - w(1 il care, 
lim &(e2) = oo 4 p. 


n--o 


(18) 


7. Approximations. Since the formulae in (16) and (17) are tedious to com- 
pute, it seemed worth while investigating whether any simple approximations 
were possible. The following expansions in powers of n* up to the term in n™ 
were found to give generally good results for n > 30. 


(19.1)  S(e) & e [1 + 4m(1 + w)n™), 
(19.2) =o (@) SE "m(1 + w)n", 
(19.3) ~W/Bi(e) = [nm(1 + w)[*[3m(1 + w) — (1 + 2e)], 
(19.4) Bo(e2) = 3 + 16[nm(1 + w)] [m1 + w)? — 12m(1 + w) 
- (1 + Qw) + 1+ 6w + 6w], 
(19.5) ple, &) = (—wp log p)*[(1 + w)(1 — p) log (1 + w)J* 
+ [1 + (dm + 40 — 4mo)n™). 





PROBABILITY OF ZERO CLASS 99 


The values of &(«) and o*(¢,) obtained from the exact formula (16) and from 
(19) are compared in Tables II and III respectively. 

It should be noted that some of the values of w shown do not correspond to 
real distributions. These cases are indicated by parentheses enclosing the corre- 
sponding figures. The values of w chosen exhibit the trend of mathematical 


TABLE II ; 
Expected value of « 


(Note: The exact values and (19.1) agree to three decimal places for all cases 
included in this table.) 





a= 30 n= @ 





| 
-oOooc oO 


S8|Ss8h|ssue 


| 

| ; | .191) 
| | @.187) 
0.102 
| 0.038 
| 


- Oo SO 


(0.550) 
0.503 
0.374 


0.901 
0.861 





- © 





TABLE III 
Approzimate and exact values of 100 o*(«:) 


a= 30 


Approx. Exact Approx. 


(0.100) (0.104) 
(0.091) (0.097) 


(0.050) 
(0.046) 


| 
0.077 0.083 0.038 
0.029 0.036 0.014 


(0.451) | (0.454) (0.226) 
0.578 0.289 
0.451 





functions of w which do give the moments of ¢« for real distributions when w 
takes certain special values, different for different p. The functions are simple 
continuous functions of w and the method of presentation should not prove 
misleading. 

Close agreement was also obtained between values given by (19.3)-(19.5) 
and the corresponding exact values. The approximation to ~/8:(¢:) was generally 





100 N. L. JOHNSON 


correct to two decimal places and that to p(@ , €,) was generally correct to three 
places for the values of n, w and p in Tables II and III. 82(¢) was correct to 
two decimal places for w negative, while for positive w the error did not exceed 
0.04 except for p = 0.1 and w = 1.0 (5.09 (approx.) against 5.46). 


TABLE IV 
_ Values of n(w,p) 


—0.2 —0.1 
80 400 
| 70 270 
27 680 


* Formula (21) gives negative values in these cases. 


TABLE V 


> 


f-2() fl) 
0528 10.8992 5954 
6916 6.4732 5.1006 
1164 3.9314 3.8761 
7809 1.6529 2.7640 
5547 — 0.9192 .4261 
3889 — 4.3392 4658 
2606 — 9.6654 3.5511 
1574 —19.9286 — 9.6719 
0722 —50.1476 — 28.0060 


0. 
0. 
0. 


1 
2 
3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 


WHHNHNHNH wwe | 


8. A critical sample size. Using the approximate formulae (19) we see that 
the mean square error of ¢, will be less than that of ¢ provided 


(20) p(l — pn < (e" — p)*® + m(1 + w) feo" — p) + Cn. 
This can be rewritten n > n(w, p), where 
(21) n(w, p) = [p(1 — p) — m(1 + we "(2e™" — p)\(e~" — p)”. 


Provided the value of n(w, p) given by (21) is sufficiently large for the approxi- 
mation in (19) to be good, it can be said that ¢«, will be a better estimator of p 
than ¢ (according to the mean square error criterion) if the sample size is bigger 
than n(w, p). For smaller sample sizes it is likely that ¢ will still be the superior 
estimator as in the Poisson case. 


When | w | is small the expansion 
(22) nw, p) = w fo(p) + wo falp) + folp) + --- 
where 


(23.1) f_o(p) = 4(p log p)*[p(1 — p) + p’ log pl, 





PROBABILITY OF ZERO CLASS 101 


(23.2) flp) = 4(p log p)*[(4 — 4 log p)p(1 — p) + (3 + log p)p’ log 7), 


2 5 2 1 1 
fo(p) = 4(p log p) ts (log p)” — DB log p — o 
(23.3) 


ae 11 og py + 2 5\ 
p(l p) + {15 (og py + 3 tog » + 3} p' tox p 


is useful. The values of n(w, p) given by the series (22) taken as far as fo(p) 
agree (to the nearest ten) with those in Table IV, which were calculated from (21). 
Values of f_»(p), f-1(p) and fo(p) for p = 0.1 — (0.1) — 0.9 are given in Table V. 


REFERENCES 

[1] E. Jounson, ‘“‘Estimates of parameters by means of least squares’’, Annals of Math. 
Stat., Vol. 11 (1940), p. 453. 

[2] R. A. Fisner, The Design of Experiments, 1st ed., Oliver & Boyd, 1935, p. 221. 

[3] D. Biacxwe.ti, “Conditional expectation and unbiased sequential estimation’’, 
Annals of Math. Stat., Vol. 18 (1947), p. 105. 

[4] R. A. Fisuer, ‘‘Theory of statistical estimation’’, Proc. Camb. Phil. Soc., Vol. 22 
(1925), p. 700. 

[5] C. V. L. Cuaruier, Die Grundziige der Mathematische Statistik, Lund. Verlag Scientia, 
1920, p. 80. 

[6] G. Pétya, “Sur quelques points de la théorie des probabilités’”’, Ann. de l’Inst. H. 
Poincaré, Vol. 1 (1931), p. 117. 

(7] J. Nevman, “On a new class of ‘contagious’ distributions’, Annals of Math. Stat., 
Vol. 10 (1939), p. 35. 

{8} W. Fevuer, “On a general class of ‘contagious’ distributions’, Annals of Math. Stat., 
Vol. 14 (1943), p. 389. 

[9] M. Tuomas, “A generalization of Poisson’s binomial limit for use in ecology”’, Bio- 
metrika, Vol. 36 (1949), p. 18. 

[10] F. Ecaensercer anv G. Péxrya, “Uber die Statistik verketteter Vorginge’’, Zeit. 
fiir Ang. Math. und Mech., Vol. 1 (1923), p. 279. 

[11] M. GreEnwoop anv G. YuLg, “‘An inquiry into the nature of frequency distributions 
of multiple happenings’’, Jour. Roy. Stat. Soc., Vol. 83 (1920), p. 255. 





Peis diet oausdticer i allinceian al 


TESTING PROPORTIONALITY OF COVARIANCE MATRICES' 


By Water T. FEDERER 
Cornell University 


1. Summary. The problem of comparing the proportionality of covariance 
matrices often arises in genetic experiments. Knowledge of nonproportionality 
of covariance matrices is useful in selection work and in genetic interpretations. 
In developing a test of significance for this contrast, the likelihood ratio criterion 
was used. Likelihood ratio tests were obtained for two sets and for three sets 
of independent variance-covariance matrices. The test for r independent co- 
variances was indicated and some unsolved problems were cited. 

2. Introduction. Tests of significance of variances from normally distributed 
variates are available for testing the equality of: 

(i) Two independent variances (Snedecor’s F, Fisher’s z, Mahalanobis’ 
zx |3], and Fisher and Yates’ variance ratio), 

(ii) k independent variances (Chi-square tests by Stevens [6], Bartlett [1], 

and Cochran [2)), 
(iii) Two variances with unknown correlation (Pitman [5] and Morgan’s 
test [4] and Wilks’ likelihood-ratio test [8}), 
(iv) k variances and of the associated covariances (Wilks’ likelihood-ratio 
test [8)), 
(v) The variances and covariances within each of several sets and the co- 
variances between sets (Likelihood-ratio tests by Votaw [7]), 
but no tests of significance are available for comparing the proportionality of 
two or more variance-covariance matrices. 

The hypothesis of proportionality of variance-covariance matrices is more 
tenable than equality in many genetic experiments, since it is known that the 
variances are unequal but it is not known if the variance or covariance for one 
strain is merely a multiple of that for the other strain. Knowledge of this is of 
importance in any genetic study on the inheritance of characters and in selection 
work. In addition, the means and variances are often related in some manner 
and a transformation of the data may not be advisable since this may lead to 
incorrect genetic interpretations. 

3. Likelihood ratio for comparing two covariance matrices. The problem of 
testing the hypothesis that the variances and covariances from strain A are 
proportional to the variances and covariances of strain B was solved by an 


1 Address presented at the Twelfth Summer Meeting of the Institute of Mathematical 
Statistics, University of Colorado, Boulder, Colorado, August 29, 1949. Plant Breeding 
paper No. 252. The author wishes to thank LeRoy Powers for supplying the data which led 
to the developments presented in this paper and A. M. Mood and others for comments on 
the paper. A part of this work was done under the auspices of the Bureau of Agricultural 
Economics, U. 8. Department of Agriculture. 


102 





PROPORTIONALITY OF COVARIANCE MATRICES 103 


application of the likelihood-ratio test. Let the characters be represented by 
X,, X2,-:- , X, for strain A and by Y;, Y2,--- , Y, for strain B, respectively. 
The hypothesis, then, is that the variance or covariance of A equals K times 
the corresponding variance or covariance for B, that is, (o:;). = K(ei;)y , where 
K is a proportionality factor and the sample variance-covariance matrices for 
A and B are independently estimated. 

The likelihood ratio for the above in general terminology is 


A = f(ai;,b ij> K, Gg y)/f (ais ? é: ) fbi , 6’), 


where 


= x (Xiu — F)(Xju — €), bi = XL (Yee — 9)(Yn — 9), 


X,, and Y, are the sample elements and Z; and j; are the sample means, i, j= 1, 
2,:-+, p, K and 6,’ are the maximum likelihood eotimates of K and o;’ computed 
under the hypothesis that (o%;)s = K(o;;),, 6;’ and 6,’ are the maximum like- 
lihood estimates of o;’ and o,’ computed under the hypothesis of independence, 
and where there are n — 1 degrees of freedom associated with the a;; and 
m — 1 with the b;;. 

It is known that the sums of squares and cross products of p normally dis- 
tributed variates follow the Wishart distribution with n — 1 degrees of free- 
dom. Furthermore, the joint distribution of two independent sums of squares 
and cross products may be written as 


F(aiz , bi, os’ ’ , oy) = f(ai;, o;’) S (bi; , oy 4, 
which is proportional to 


loi? [hos | off [hoa | Q;; |in—p-2) | bi; jin» exp | - } = > (ofa;; + os’) | 


t=] j=l 


The maximum likelihood estimates for oi/ and o;/ are: 
I] 27 || = | ass/(m — 1) II and |] 8)? || = || :5/(m — 1) II; 


also (Gis)e = ai;/(n —_ 1) and (Gi5)y = bi;/(m - 1). 
Now under the hypothesis that the variances and covariances are propor- 
tional, i.e., (oi;)2 = K(ox;),, the joint distribution is proportional to 


oft /K PO | git HOP | @,, [MOP | p,, [hemre-® exp| - 4 DD oi!(ay/K+ bu) | 


i=] jm] 


The maximum likelihood estimates of K and ¢}’ are obtained from the equa- 


tions 
= DY a a,/p(n — 1) 


i=] jel 





104 WALTER T. FEDERER 


and 
I} oy? || = | (ai; + Rbs;)/K(m + m — 2) ||". 


It is possible to solve for K in the above 2 equations. For p = 2 the equation 
in K free of 6,’ is 


| on bie = n+m-—2 ( a bu bis ) 
Kk? +K (: - ——_——_ ) + 
bis bee 2(n — 1) bis ae dz On 


2(n + m — 2) QM 
ee — 
+ ( 2(n a 1) Qe dn 


For p = 3, K is obtained by solving the following equation: 
bu biz as 
R* | bn be bes 
ba dae dss 
| au 
bo 
bse 


-*tm 3) 
3(n — 1) 


aye a2 
2(n + m — = 
3(n — 1) 


a dz be bes | + 
| 


\| bss bas | Az. ag 


au 
Ze 3(n + m — 2) 
+ (1 3in — 1) ) ax A23 
az a33 


For p = 4 and higher, the coefficients are obtained in a like manner. The 
number of determinants for each power of K will be the same as the coefficients 
in the binomial. 

The proof that there is only one positive root in the polynomial in K would 
be obtained by proving that the sum of the determinants, associated with any 
power of K, is positive.’ Since the other coefficients in the polynomial are posi- 
tive up to a certain point and negative thereafter, there is only one change in 
sign, and thus only one positive real root. The positive root is the only one of 
interest here since the variances are inherently positive. 

The likelihood ratio for comparing the proportionality of the variances and 


2A proof of this was first called to my attention by Isadore Blumen. 





PROPORTIONALITY OF COVARIANCE MATRICES 


covariances for strains A and B is 


m= RP as/(n — 1) | bu/(m — 1) (K™ 


(aeg + Rbss)/ (n + m — 2) [AO*"™, 
where —2 log } is distributed approximately as chi-square with 
pip + 1) — tp(p + 1) -—1=3p(p+1)-1 


degrees of freedom when m and n are large. 

4. Likelihood ratio for comparing three covariance matrices. In the event 
that 3 independently estimated sets of variances and covariances are compared 
for proportionality, the likelihood ratio is 


d= flai;, bis, C43, Ri, Re, a3)/flais, BS is , Gy F (Cus, 62), 


where ¢;; = > (Ziw — 2:)(Z jw — 23), Zio are the sample elements and 2; the 
w=l 


sample means, K;, K:, and ¢} are the maximum likelihood estimates of K: , 
K, , and of’ computed under the hypothesis of proportionality, that is, (0;;)2= 
Ki(ois)y = Keloij)e, (Gij)2, (Gesy, and (6:3), are the maximum likelihood 
estimates of (c:;)2, (¢:;)y, and (o;;), computed under the hypothesis of inde- 
pendence, a;;, bi;, and c;; have n — 1, m — 1, and g — 1 degrees of freedom 
respectively, and the a;; and b;; are as defined previously. 

Under the hypothesis of independence the maximum likelihood estimates of 
o}’, a3’, and o}’ are 6!’ and 6}, given in Section 3, and ¢// which can be obtained 
from the equation || #! || = || ¢:;/(q — 1) ||. 

Under the hypothesis of proportionality the maximum likelihood estimates 
of K,, Kz, and o}’ are obtained from the equations: 


R, = R,2243'b;;/p(m — 1), 
R, = Drei(ai; + Rybi;)/p(n +m— 2) 


and 


|| é;’ || = || (ai; + Kibis + Reci;)/Ko(m + n + q — 3) t 
The positive roots (probably only one for each proportionality constant) 
for K; and K, which maximize the likelihood ratio are the ones used. Substitut- 
ing these values in the likelihood ratio the following results: 
i= Ree kK, | ai;/(n — 1) jen 1bi;/(m — 1) [Kenn 
| css/(q — 1) |ite-0 | aif eereny 


When sample sizes are large, —2 log \ is distributed approximately as chi-square 
with p(p + 1) — 2 degrees of freedom. 


The method for comparing r independent sets of variances and covariances 





106 WALTER T. FEDERER 


follows by a simple extension of the above likelihood ratio and the solution for 
the r — 1 proportionality constants. 

5. Unsolved problems. The nature of the roots for the proportionality con- 
stants requires further study. Also, likelihood ratios could be developed for 
comparing the proportionality of r non-independent covariance matrices under 
various hypotheses. A study of these tests of significance could be made in 
much the same way as described by Votaw [7]. Such a study is necessary before 
a complete understanding of this test is obtained. 


REFERENCES 


{1] M.S. Bartuert, ‘‘Some examples of statistical methods of research in agriculture and 
applied biology,’ Jour. Roy. Stat. Soc., Suppl., Vol. 4 (1937), pp. 137-183. 

[2] W. G. Cocuran, ‘‘Problems arising in the analysis of a series of similar experiments,”’ 
Jour. Roy. Stat. Soc., Suppl., Vol. 4 (1937), pp. 102-118. 

[3] P. C. MaAHALANoBISs, “Statistical notes for agricultural workers. No. 3—Auxiliary tables 
for Fisher’s z-test in analysis of variance,”’ Ind. Jour. Agri. Sci., Vol. 2 (1932), 
pp. 679-693. 

[4] W. A. Moraan, “A test for the significance of the difference between the two variances 
in a sample from a normal bivariate population,’”’ Biometrika, Vol. 31 (1939), 
pp. 13-19. 

{5} E. J. G. Prrman, ‘‘A note on normal correlation,’’ Biometrika, Vol. 31 (1939), pp. 9-12. 

[6] W. L. Stevens, “Heterogeneity of a set of variances,’ Jour. of Genetics, Vol. 33 (1936), 
pp. 398-399. 

[7] D. F. Voraw, Jr., ““Testing compound symmetry in a normal multivariate distribu- 
tion,’’ Annais of Math. Stat., Vol. 19 (1948), pp. 447-473. 

[8] S. S. Wixs, “Sample criteria for testing equality of means, equality of variances, and 
equality of covariances in a normal multivariate distribution,’ Annals of Math. 
Stat., Vol. 17 (1946), pp. 257-281. 





AN INVERSE MATRIX ADJUSTMENT ARISING IN 
DISCRIMINANT ANALYSIS 


By M. S. Barrietr 


University of Manchester, England 


1. Introduction. The adjustment of an inverse matrix arising from the change 
of a single element, or of elements in a single row or column, in the original 
matrix has recently been discussed by Sherman and Morrison [1, 2]. In discrimi- 
nant function analysis the adjustment due to the addition of a degenerate 
matrix of rank: one to the original matrix has sometimes been required, and 
the method used by the writer is described in this note. It will be noticed that 
this case includes the cases considered by Sherman and Morrison. 


2. General formula. The new square matrix can always be written in the form 
(1) B=A-+ uv, 


where wu is a column vector (single column matrix), and v’ a row vector (dashes 
denote matrix transposes). We write formally 


B= (A+ uv)’ =A‘(1+ uv A”) 
= A“(1 —uvwA’ + uwA‘uwA™ — ---) 
= A — A‘u-wA™ {1 — wA‘u + (wA™“u)’ — ---} 


nt A‘u-vA" 

1+ vA u’ 
which has the same simple structure as (1) and can be determined when A~* 
is known. To check this formal result, we may easily verify that pre- or post- 
multiplication of the expression (2) by B gives the unit matrix. 


3. Numerical example in discriminant analysis. The general regression rela- 
tion between two sample matrices S. and S,; may be written (Bartlett [3]) 


(3) S: = CuCi's, + S21. 


Here the n observations of any variable (measured if necessary from the general 
mean) comprise one row in the appropriate matrix, S, and S, representing 
respectively the dependent and independent variables. S,S; is written Cx for 
convenience, and similarly for Cy, , Cx ; also Cx, = S::S;,. In discriminant 
analysis in its strict sense S, stands for a single dummy variable serving to 
isolate a group or other contrast between the proper random variables §S,*. 
In that case the equation 


(4) Cu @ CaCI Cr + Cos 
107 





108 M. S. BARTLETT 


derived from (3) becomes of the form 
(5) Cn = 22’ + Cri. 


The discriminant function coefficients in Fisher’s original discussion [4] of this 
type of analysis are proportional to the solution a of the equation 


(6) Cnia = Zz 


(see Bartlett [3], p. 37), and hence are obtained as Cz 1z, where C... is the 
matrix of ‘sums of squares and products’ within groups. But in tests of sig- 
nificance of a it is convenient (see, for example, Bartlett [5], §5) to make use 
of the ‘inverted regression relation’ (first noted by Fisher [4], p. 184) 

(7) Si _ Cu.CrS. + Sis ’ 

giving discriminant function coefficients b = Cz'C» . 

It is sometimes required to obtain the second (equivalent) form of solution 
involving Cz from computations already available based on the first method of 
analysis involving Cz, . For example, in Fisher’s original comparison of Iris 
versicolor and Iris setosa based on 50 observations, on each species, of the variables 

2, = sepal length, 
22 = sepal width, 
x; = petal length, 
24 = petal width, 


he gives (p. 181) for Cz', the (symmetric positive definite) matrix 


Z Ze Z3 % 
0.1187161 

(8) 2 — 0.0668666 0.1452736 

X3 —0.0816158 +0.0334101 0.2193614 

x4 +0.0396350 —0.1107529 — 0.2720206 0.8945506. 
We take S; as a pseudo-variate with value +4 for one species and —4 for the 
other, so that Cy = 25, and C» is the column vector of differences in means 
multiplied by 25, and z’ = Cx/+/25. From (5) and (2) the inverse of Cy is 
-1 Cro 2z-2'Cra1 
Cri — es 
1 + 2'C0.12 
or from (6), 

ah aa’ 

(9) Cro. ites’ 
Fisher actually gives the solution of (6) with z replaced by the vector of mean 
differences, so that, in terms of his solution c, where 


—0.0311511 
‘ —0.1839075 
i +0.2221044 
- +0.3147370 , 





INVERSE MATRIX ADJUSTMENT 


we find that (9) becomes 
(11) ma — 0.9146 ce’. 
Hence we obtain Cz (without having to re-work it from C2.) as 


ty Ze Z3 % 
1 0.11783 
(12) Xe —0.07211 0.11434 
Xs — 0.07529 +0.07077 0.17424 
Xs +0.04860 — 0.05781 — 0.33595 0.80395. 


With this matrix we can complete the formal regression analysis of S, , giving 
for b and its ‘standard errors’ 


— 0.02847 0.03368 
—0.16808 0.03318 
+0.20298 0.04095 
+0.28764 0.08798. 


The solution b we know to be a multiple of the solution c (as may be verified to 
within 2 in the fourth decimal place), but we also see from (12) that the first 
variable is not contributing to the discrimination and might be omitted. The 
corresponding analysis of variance of §, (c.f. Fisher’s Table VII) gives 


(13) 


D.F. S.S. 


| | 
poratns 4 
| 24.0785 | 

0.0069 | 
within species ie 0.9146 | 0.011088 


between fre » te 5 Be cnn 3 
(14) species (|2; (partial)..... 1 


Total 


25 .0000 


so that the square of the multiple correlation coefficient is only reduced from 
0.96342 to 0.96314 by the omission of z, . It should be noticed that the multiplier 
0.9146 in (11) is the ‘within species’ entry in (14). 

4. Theoretical example in discriminant analysis. The formula (2) is also 
theoretically useful in deriving the discriminant function by ‘size and shape’ 
suggested by Penrose [6]. It is known that for multivariate normal variables x 
with constant variance matrix V the ideal discriminant function for contrasting 
two groups has coefficients d’V’, where d is the column vector of true differences 
in means of the two groups. It is now assumed that after standardization of 
each variable to unit variance we can write 


p 
5 us 
(15) : = (1 — p)I + pww’, 








110 M. S. BARTLETT 


where I is the unit matrix and w a column vector with unit components. Applying 
formula (2), we find the inverse matrix 






/ 


I p ww 
16 GF eek gi cian ten gs Pan qenplerggedltiinn 5 
(16) -—p i= pi + ap = ft) 


where p is the number of variables. Hence 


— 


























d’ (w’d)w’ 
d’v™ lak I ee gaa ey since coietepgmaaiiaiaiiniaie 
l1—p 1-—p1+p(p- 1) 
, , ) 
(17) pl» {[ 28 on ‘ iE - tsa} 
pl —p)\Lwa mr 1 + ofp — 1) J) 


eS aw ee 

e era w+; + p(p — 5) 

where the two sets of coefficients in (17), h’ and g’ (« w’), say (respectively), 
are arranged to give zero correlation between g’x and h’x. This is checked by 


evaluating the covariance E{w’y-h’y}, where E denotes expectation, and y the 
standardized vector deviate with variance matrix E{yy’} = V. We have 


E{w’y-h’y} 


ely PG Ae PLE LI AE 


E\w'yy’h} = w'Vh = w'[(l — p) + pww'| 24 Po w| 


ll 


p(1 — p) + ppw’w — (1 — p)w'w — p(w’w)’ = 0. 
In view of this zero correlation the best discriminant function is of the form 


1 Lo 
(18) y+ ys, 
1 2 


where y; = w’x (the ‘size’ variable), 


ae 


y2 = h’x (the ‘shape’ variable), 


d; is the difference in means for y; amd v, its variance, and similarly for ye. 
Penrose has shown that even if V is not exactly of the homogeneous type (15), 
the above method often gives a very good discriminant function. Applying it 
to the numerical data referred to in section 3 above, for example, it will be found 
that we obtain estimates 

Size weighting (d,w/v,) Shape weighting (d-h/v.) Final weighting 


IES RS eS RSs BUS 





X1 1.4351 — 2.3353 — 0.9002 


(19) Xo 1.4351 — 8.0664 — 6.6313 
Ge 1.4351 +5.9774 7.4125 
x 1.4351 +4.4243 5.8594. 





It should be noted that the final weightings in (19) correspond with formula 
(18), and differ slightly from those given by Penrose (Table 5), who makes 
allowance for the cbserved correlation between y; and y:. This allowance seems 







INVERSE MATRIX ADJUSTMENT 111 


somewhat illogical and in any case rather a refinement. Thus Penrose’s coefficients 
give a squared multiple correlation coefficient of 0.96334, whereas those in (19) 
give 0.96329 (compared with the maximum given in Section 3 of 0.96342). 

This method is much quicker than the exact method, but of course the full 
analysis, as has been indicated in Section 3, enables the most efficient yet 
economical discriminant function to be found. 


REFERENCES 


[1] J. SuermMan anp W. J. Morrison, “Adjustment of an inverse matrix corresponding 
to a change in one element of a given matrix’”’, Annals of Math. Stat., Vol. 21 (1950), 
p. 124. 

[2] J. SHeRMAN AND W. J. Morrison, ‘‘Adjustment of an inverse matrix corresponding to 
changes in the elements of a given column or a given row of the original matrix’’, 
abstract, Annals of Math. Stat., Vol. 20 (1949), p. 621. 

[3] M. S. Barruert, “Further aspects of the theory of multiple regression”, Proc. Camb. 
Phil. Soc., Vol. 34 (1938), p. 33. 

[4] R. A. Fisuer, ‘“‘The use of multiple measurement in taxonomic problems’’, Annals of 
Eugenics, Vol. 7 (1936), p. 179. 

[5] M. S. Bartiert, ‘Multivariate analysis”, Jour. Roy. Stat. Soc., Suppl., Vol. 9 (1947), 
p. 176. 

[6] S. Penrose, ‘‘Some notes on discrimination’, Annals of Eugenics, Vol. 13 (1946-47), 
p. 228. 















NOTES 
This section is devoted to brief research and expository articles and other short items. 


RI 


ON A THEOREM OF LYAPUNOV 


By Davin BLACKWELL 
Howard University 


The purpose of this note is to point out two extensions of the following theorem 
of Lyapunov’, and to note an interesting statistical consequence of each.” 

LyaPpuNov’s THEOREM: Let wu, +--+, Un be non-atomic’ measures on a Borel 
field B of subsets of a space X. The set R of vectors [w(E), +++ , un(E)], E ¢ B, is 
convex, 1.e., if Tr: , 2 € R, so does tr; + (1 — tre forO ct <1. 

EXTENSION 1. Let wu, +++ , Un be non-atomic measures on a Borel field of sub- 
sets of a space X and let A be any subset of n-dimensional Euclidean space. Let 
f = a(x) = [a;(x),--- , an(x)] be any B-measurable function defined on X with 


values in A, and define v(f) = [fa2) du,-*-, [ouce) du,|. The set of vectors 


v(f) is convex. 
Lyapunov’s theorem is the special case in which A consists of two points 
(0,---, 0) and (1,---, 1). 


Proor. Let o(f;) = vi, fi = faa(x), +++ , ain(z)], i = 1, 2, and consider the 
2n-dimensional measure 


w(E) = / au(x) du --- a M,(x) dun, [ an(x) du, +, | Gan(x) du,. 
E gE E gE 






















Since w(N) = (0,--- , 0) where N is the null set, w(X) = (uv, v2), for any ¢, 
0 < t < 1, there is, by Lyapunov’s theorem, a set E ¢ & with w(E) = (tn, , tvs), 


1“Sur les fonctions-vecteurs complétement additives,” Bull. Acad. Sci. URSS. Sér. 
Math. Vol. 4 (1940), pp. 465-478. For a simplified proof of Lyapunov’s results, see Halmos, 
“The range of a vector measure,”’ Bull. Amer. Math. Soc., Vol. 54 (1948), pp. 416-421. 

2 Since this note was submitted, results obtained earlier by Dvoretzky, Wald, and Wolfo- 
witz have appeared in the April 1950 Proceedings of the National Academy of Sciences. Their 
results are closely related to those presented here, and anticipate the general conclusion 
reached here: that in dealing with non-atomic distributions, mixed strategies are unneces- 
sary. Their principal tool is also an extension of Lyapunov’s theorem; their extension does 
not appear to contain or be contained in either of the extensions given here. The situation 
considered here is more general in that an infinite number of possible terminal actions are 
possible, but more restricted in that only mixtures of a finite number of pure strategies are 
considered here. 

* A measure u is non-atomic if every set of non-zero measure has a subset of different 
bon-zero measure. 


. oa np RE RSL LNENE ENG II ETN 


112 


THEOREM OF LYAPUNOV 113 


so that w(CE) = [(1 — é)u,, (1 — én]. Define f = f, on E, f = fe on CE. Then 
v(f) = tv, + (1 — tv, . This completes the proof. 

This extension may be reformulated using statistical language, in the special 
case where %,-°-:, Us, are probability measures, as follows: In a statistical 
decision problem in which there are only a finite number of possible distributions, 
each of which is non-atomic, mixed strategies on the part of the statistician are un- 
necessary: anything which can be achieved with mized strategies can already be 
achieved with pure strategies. 

In amplification, u,,---, wu, are probability distributions, and z is an ob- 
servation chosen according to one of them. Having observed z, the statistician 
must choose an action d from a set D of possible actions. His loss in choosing 
an action d is a(1, d), --- , a(n, d) when the true distribution of x is uw, --+ , Un, 
respectively. Thus the choice of d may be described as choosing a point a e« A, 
the subset of n-dimensional space consisting of the set of loss vectors 


{a(1, d), eee a(n, d)), deD. 


Of course several points d may lead to the same a. From our point of view, 
two d’s with the same a may be identified, so that it is no loss of generality to 
consider A itself as the set of possible actions. 

A strategy for the statistician is then a function f = a(x) from X into A, 
specifying the action to be taken (i.e., the loss vector to be chosen) when z is 
observed. We shall consider only &-measurable strategies f. The expected loss 


vector from g, strategy f is v[f] = fae) duw,-*:, [oxce) du, ; the i-th com- 


ponent is the expected loss from f when the true distribution is u;. Thus the 
range R of v(f) is the set of expected loss vectors attainable with pure strategies 
f. By mixed strategies, i.e., using strategies f, , --- , f, with probabilities 


Piy*** » Dey p= 1, Zpi = 1, 


the statistician can attain all vectors in the convex set determined by R, and 
only those. Thus if R is already convex, nothing is gained by the use of mixed 
strategies." 

Sequential sampling. The above discussion applies directly only to the action 
to be taken after a sample point z has been obtained, sequentially or otherwise, 
and asserts that, in the non-atomic case, nothing is gained by mixing actions. 
It is still possible that a mixture of sampling plans, for instance tossing a coin 
to decide whether to take another observation, might, even with non-atomic 
distributions, achieve an expected loss vector not attainable with any one 
sampling plan. It turns out, however, that nothing is gained by mixing sampling 
plans, provided all sampling plans provide for at least one observation, and that 
the distributions of this observation are non-atomic. Formally, we have the 


* It has been shown by the author in a paper submitted to the Proceedings of the American 
Mathematical Society that if A is closed, R is closed. Closure of R implies that a minimax 
strategy for the statistician exists. 





114 DAVID BLACKWELL 


TuHEeoreM: Let x = (x, %2, +++) be a sequence of chance variables whose joint 
distribution is one of n probability distributions uw, ,--- , un. Let S:,--- , Sw be 
N sequential decision functions, each requiring the observation of x, , and suppose 
the distributions of x, under u,,--- , Un are non-atomic. Then any expected loss 
vector attainable from a mixture of S,,---, Sy ts also attainable from a single 
decision function S. 

Proor. Let d;;(x) be the loss from S; when the distribution of z is u;. (The 
loss is a function of x as well as 7, j, since the cost of observations may vary 
with x.) Then a; = (Ed,;,---, Ed, ;) is the expected loss vector from S;. 
Since S,,---, Sy all involve observing x, , the statistician need not make up 
his mind about which decision procedure to use until after 2; is observed, i.e., a 
possible decision procedure is a division D of sample space into N mutually ex- 
clusive 2;-sets D,, --- , Dy, and to use decision procedure S; if 2; « D;. 
The expected loss vector from ®D is 


v(D) = (> [ $ij(t1) dus(a), +--, ae $nj(%1) dus(a) 


where ¢;;(2;) is the conditional expectation of d;; with respect to x, . If D is the 
decision procedure with D; = space X, D; = null set fori ¥ j, then v(D) = a;. 
Thus it is sufficient to show that the range of v(D) is convex. 

The convexity of the range of v(D) is the special case where m, --- , Un are 
probability measures of 

EXTENSION 2. Let w,--+, Un be non-atomic measures on a Borel field B of 
subsets of a space X, let $:;(z), i = 1,--:,n,j = 1,---, N, be B-measurable 
functions of x such that $;; is u;-integrable over X, let D = (Di, --- , Dy) bea 
decompositon of X into N disjoint subsets, and define 


N N 
v(D) = ( / oijduy,-*- a 6 du). 
j=1 9D; j=1 YD; 


The range of v(D) ts convex. 
Proor. Let D = (Du,---, Dew), k = 1, 2 be two decompositions. We 
must show that for any ¢, 0 < ¢ < 1, there is a D with vo(D) = 


tv(D:) + (1 — é)v(D.). Write m;;(B) = | 6: du; , and consider the 2nN-dimen- 
B 


sional measure w(B) = mj;(BD,;), i = 1,°--,n, 7 = 1,°:+, N, k = 1, 2. 
Since w(B) is non-atomic, Lyapunov’s theorem asserts there is a B with w(B) = 
tw(z), i.€., mi;(BD,;) = tm; ;(Dz;). Then m;j;(C(B)Dz;) = (1 = t)m;;(Dx;). 
Define D; = BD,; + C(B)D2; a = i, rRA N, D = (D,-:: ’ Dy). Then 


v(D) = > [mi;(D)), ott. m,;(D;)] 


j=l 


t > [mij(Dij), +++, Mas(Da)) + (1 -— dO = [m,(De,), +++ , Mpj(De)] 


j=l 


tv(Ds) + (1 — Av(D>). 





TEST OF SERIAL CORRELATION 115 


A NOTE ON THE TEST OF SERIAL CORRELATION COEFFICIENTS 
By Masami OGAWARA 


Meteorological-Research Institute, Tokyo 


1. Summary. In this note the author points out that in the case of stationary 
Gaussian Markov process, i.e., autoregressive stochastic process, we can test 
the serial correlation coefficients by a method based on normal regression theory. 
Particularly, in the case of simple Markov process, we can find the confidence 
limits for its autocorrelation coefficient. 

In this method, so far as random variables are concerned, not all the informa- 
tion in the original data is used, with a consequent reduction of degrees of 
freedom. However, the other part of information is introduced as parameters 
in the distribution functions of random variables and in the statistic useful 
for tests. 


2. Introduction. For the test of the serial correlation coefficient, a method 
based on its distribution may be orthodox. Up to the present, however, many 
investigations along this line, e.g. R. L. Anderson [1], M. H. Quenouille [2], 
P. A. P. Moran [3], T. W. Anderson [4] and others seem to be confined in at 
least one of the following restrictions: 

(1) circular definition, 

(2) significance test, i.e., testing the uncorrelatedness of the process, 


(3) approximate distribution. 

In this paper, we do not use the distribution of a serial correlation coefficient 
itself, but normal regression theory, and will give the general testing method for 
an autoregressive stochastic process. 


3. Fundamental theorems. The following theorems are fundamental in our 
method. 

THEOREM 1. Let a,(t = --- , —1, 0, 1, 2, ---) be @ simple Markov process. 
If the values of xxa(k = 1, 2, --- , m+ 1) are fixed, the random variables ry, 
(k = 1, 2, --- , n) are mutually independent.’ 

This theorem is easily proved from the following facts: 

(1) When the value of z is given, x; , :-- , Z, are independent of x_; , x», -- 

(2) When 2 is given, the stochastic sequence 2; , 72, --- , is also a simple 
Markov process for the inversely directed time scale. 

Similarly, the following general theorem holds: 

THEOREM 2. Let x, (t = --- , —1,0, 1,2, ---) be a Markov process of order h. 
Then, if the values of Tinzy—r, *** » Tengy—1, Teaengr, *** » Teagnge (KF = 
1,2, --- ,) are given, the random variables zianss) (k = 1,2, --+ , n) are mutually 
independent. 


! This fact has been used by U. V. Linnik (without proof) in his proof of the central 
limit theorem for simple Markov process. Izvestiya Akad. Nauk. USSR., Ser. Mat., Vol. 13 
(1949). 








116 MASAMI OGAWARA 







Tueorem 3.” Let x; ({ = «+: , —1,0, 1, ---) be a stationary Gaussian process. 
A necessary and sufficient condition that x, should be a non-singular Markov 
process of order h is that its autocorrelation coefficients p, satisfy the finite difference 
equation . 


(1) Pr + Mi pra + +++ + Gapra = 0, r= 1,2, +++; a, & 0, 
where the a’s are such that every root of the equation 
ae + +++ + aruz+ a =0 


lies within the unit circle. 

















4. The case of a stationary Gaussian simple Markov process. Let m, o° and 
pr(=p") be the mean, variance and autocorrelation coefficient, respectively, 
of a stationary Gaussian simple Markov process z; with discrete parameter f. 
According to Theorem 1, when the values of ry:(k = 1, 2, ---,m + 1) are 
fixed, 7x (k = 1, 2, --- ,) are mutually independent and, in this case, their 
conditional probability densities are given by 


Jat ESR A aR RD 


' 1 1 no 
S (xox | Leer, Ley) = Siem 7 ee {x — (a + ba,)} 
a 0 #79 


(k _ 1,2,---,n), 














a = m(1 — p)/(1 4+ p’). 
b = 2p/(1+ p), 
(2) - ‘ ° . 
oo =o(1—p)/(1+ Pp), \ 


, 
Le = (Lea + Lxe41)/2. 


Considering 2, as the fixed variates and applying normal regression theory 
[6], the maximum likelihood estimates of the parameter a, b, and 09 are given by 


4 = Ze —_ of’, 


) (3) b= 3 et - 200m — ») /D i — 2, 
f k=l k=l 
; 


. (xx — 4 — bx)*/n, 


kel 








where 
n nm 

= = 4, in = 

i = x ty/n, # = x a,/n = (3, + %)/2 


2M. Ogawara [5]. 






TEST OF SERIAL CORRELATION 


i= x Tx1/n, 3 = x Tx+1/N. 
We can rewrite 6 as follows: 


(4) 6 = 2r,/(1 + 12), 
where 


ate Pa (tora — %1) (tou — %2) + . > (te — ¥2) (roe — a} 
ate ps (cx-1 — #)° + -% (Toru. — a\ 


n 
’ 


> (wen. — 41) (Tory. — Zs) 
it ps (xx — %:)° + = z (xn41 — 3)” 


2 2 2\2 
ge bee > 0 lorle| & 0), 
the maximum likelihood estimates of m, o” and p are given by 
4/(1 — 6), 
(6) a0/V1 — B, 
(1 — Vi — &)/6. 


Since, as the function of random variables rx , 


(6 — 0)? > (x, — #°)*-(n — 2) 


Because 


p>» (tx — 4 — bax)? 


1 
has the F-distribution with 1 and n — 2 degrees of freedom, we can test the 
hypotheses p = pp or b = by = 2po/(1 + po). Asthefunctionp = (1 —+/1 — b*)/b 
is monotone increasing, we can also find confidence limits for p from those for b. 


5. The case of a stationary Gaussian Markov process of order h. Let, as 
before, m, o” and p, be the mean, variance and autocorrelation coefficient of our 
process z; respectively. From Theorem 2, the random variables tiasy(k = 
1,2, --+ ,m) are independent of each other, under the condition that the variables 
Leh+i)—p » Teagt)+p (P = 1, 2, +++, hy k = 1, 2, ---, m) are fixed, and, in the 
present case, their conditional probability densities are given by 


S (tears | Teasn—p » TeaD+n 5 P = 1,2, +++ , h) 
(8) 1 1 A ,\? 
= \/an of exp | -an {040-9 seas x beam} | (k = 1,2,---,n), 





118 MASAMI OGAWARA 
where 2p: = (easn—p + Teasn+p)/2 (p = 1, 2, ++, A), Zoe = 1, and where 


h 
w=m(1-2b6 ’ by = 2cy (p = 1, 2, +++, h), 


p=l 


(oo | 1 °° pana Past °° pm | ( pr 


Ch—1 a ° ° ° Ph—1 


C1 = oe 1 ee Ph+t | . pi 

C1 oe): ie oF. Ph—1 | Pl 
& . . . ° 
Ch — P2h °° Ph+i Ph °° 1 


and 


(0) lt a 1 + dips + +++ + Gp o 
Lait: +a 


where the a’s are the coefficients of equation (1). 

Considering the relations (1) and (9), the hypotheses concerning p; , -*- , pr 
is equivalent to the hypotheses concerning ¢,, --- , c, or bi, --- , bx. Thus 
normal regression theory is applicable. 

Moreover, we can estimate the order of the Markov process as follows. ‘The 
above stated theory holds whenever the essential order hy of the process is not 
greater than h. Hence, we may select as its order such a value hy that the hy- 
potheses bh, = bajii= **: = da = O is rejected but thie hypotheses b,,4: = 
binge = *** = Bb, = O is not rejected. 


REFERENCES 

|1} R. L. ANDERSON, ‘Distribution of the serial correlation coefficient,’’ Annals of Math. 
Stat., Vol. 13 (1942), pp. 1-13. 

[2] M. H. QueNnourLue, ‘Some results in the testing of serial correlation coefficients,” 
Biometrika, Vol. 35 (1948), pp. 261-267. 

[3] P. A. P. Moran, ‘‘Some theorems on time series, II. The significance of the serial cor- 
relation coefficient,”’ Biometrika, Vol. 35 (1948), pp. 255-260. 

[4] T. W. Anperson, “On the theory of testing serial correlation,’’ Skandinavisk Aktu- 
arietidskrift, Vol. 31 (1948), pp. 88-116. 

[5] M. Ocawara, “On the normal stationary Markov process of higher order,’”’ Bull. of 
Math. Stat. (in Japanese), Vol. 2 (1948), pp. 42-46. 

(6) S. S. Witxs, Mathematical Statistics, Princeton University Press, 1943. 


* Owing to (1), this is also equivalent to the hypotheses concerning a; , --- , aa. 





PROBABILITY MEASURES 119 


REMARK ON SEPARABLE SPACES OF PROBABILITY MEASURES 
By AGNES BERGER 
New York City 


Early writers on mathematical statistics often had to assume that the 
distributions under consideration either admitted probability densities, some- 
times subject to further regularity conditions, or that they were purely discrete; 
in general, two separate arguments were needed to deal with the two cases. 
More recent authors however have achieved greater generality and, at the 
same time, a unification of methods by dispensing altogether with assumptions 
on the distributions themselves and specifying, instead, their relation to each 
other. In particular, these writers assume (for example in [1], [2], [3]) that the 
probability measures under consideration form what is sometimes called a 
“dominated set of measures”, defined as follows: Let X be the sample space, B a 
Borel field of some subsets of X and let 2 = {m} be a set of probability measures 
defined on %. © is called a dominated set of measures if there exists a o-finite 
measure » such that every m in Q@ is absolutely continuous with respect to yu. 

One of the important consequences of assuming that 2 be dominated is that, 
if such an 2 is metrized by introducing 


d(m, m’) = sup | m(B) — m’(B) | 


as a metric and % is a separable Borel field (as for instance in the case of Borel 
sets in finite dimensional Euclidean spaces), then Q is separable with respect 
to the topology induced by d. (Proof of this can be based on Hopf’s approxima- 
tion theorem as indicated in [1]; a proof for measures dominated by Lebesgue 
measure is referred to at the end of [4].) 

Since the separability of dominated sets of measures is used to great advantage 
(for example in [1] and in [4]), one wonders whether there exist any other separable 
sets of measures than dominated ones. It will be shown to the contrary, that 
an even weaker separability condition than the one described implies that the 
set be dominated. In order to state the exact theorem, we shall consider a set 
M = {m} of probability-measures defined on a common Borel field % of subsets 
of some abstract space X and introduce a weak topology into M in the usual 
way (see [5]) by defining a complete system of neighborhoods as follows: For 
every p in M and for every finite collection of sets B,, B,, --- , By in B and 
every « > 0, let a = (B,, B,, --- , By ; €) and let 


V.(p) - {m in M|\| m(B;) — p(B) | <¢t=1,2,->- » k,}, 


i.e. the set of all those measures in M whose values assumed on the sets B, , 
B,, ---+ , B, differ less than ¢ in absolute value from the corresponding values of 
p. V.(p) is called the neighborhood of index a of the measure p. By letting a 
range over all possible finite collection of sets in 8 and all positive numbers e, 
V.(p) defines a complete system of neighborhoods (see for instance [6]), so that 
M may be regarded as a topological space. We shall prove the following theorem: 


POE AM tah, 


pee Hee ARN IRA pattem is pe NN 





120 AGNES BERGER 


THEOREM. If a set of measures M is separable with respect to the weak topology 
defined above then M is dominated. 


Proor. By assumption, there exists a sequence of measures {m;} in M such 
that to any given p in M and any given a, there exists an m; in V.(p). Let 


w= dem, 0 < ¢ <1, 7. c; = 1; then n(X) = 1. Let B in B be such that 
t=1 t=] 
u(B) = 0. Obviously, m;(B) = 0 for all 7. Let p be an arbitrary fixed measure 
1 
in M and consider the sequence of neighborhoods V.,(p), where a; = (3; 3) 
j = 1,2, --+ . Then for any fixed j there exists an m, which is in V.,(p), thus 


1 
| m(B) — p(B) | < 93° 


1 
Since m,(B) = 0 forall, p(B) < 9 and since j was arbitrary this means p(B) = 0. 


Thus whenever u(B) = 0 for some B we have p(B) = 0 for every p in M, as we 
wanted to prove. 
Since a set of measures separable with respect to the metric topology induced by 


d(m, m’) = sup | m(B) — m’(B) | 


is a fortiori separable in the weak topology, we can add the following theorem: 

THEOREM. A necessary and sufficient condition for a set of measures defined 
over a separable Borel field to be separable with respect to the topology induced by 
the metric d is that it be a dominated set of measures. 


REFERENCES 


{1] E. L. Leumann anv H. Scuerré, ‘‘Completeness, similar regions and unbiased estima- 
tion,’’ Sankhyd, to be published. 

[2] P. R. Haumos ano L. J. Savace, “Application of the Radon-Nikodym theorem to the 
theory of sufficient statistics,’’ Annals of Math. Stat., Vol. 20 (1949), p. 225. 

[3] E. L. Leamann, ‘‘Some principles of the theory of testing hypotheses,’’ Annals of 
Math. Stat., Vol. 21 (1950), p. 1. 

(4) ABRAHAM WALD, “Statistical decision functions,” Annals of Math. Stat., Vol. 20 (1949), 
p. 165. 

[5] J. von Neumann, “‘Zur Algebra der Funktionaloperatoren und Theorie der normalen 
Operatoren,’’ Mathematische Annalen, Vol. 102 (1930), p. 370. 

[6] Hausporrr, Grundztige der Mengenlehre, Leipzig, 1914. 

[7] E. Horr, “‘Ergodentheorie,’’ Ergebnisse der Mathematik und ihrer Grenzgebiete, Vol. 5, 
No. 2 (1937), p. 4. 





SECOND EXTREME 


TABLE OF THE ASYMPTOTIC DISTRIBUTION 
OF THE SECOND EXTREME 
By E. J. Gumpet anp J. ArrHur GREENWOOD 
New York City and Manhattan Life Insurance Company 

The asymptotic distributions of the extreme values taken from an initial 
distribution of the exponential type are now widely used, for example in flood 
control [6] and in problems connected with the breaking strength of material 
[1]. Therefore, the corresponding distribution of the penultimate (and of the 
second) value may also be of practical interest. 

Let F(x) be the initial probability; let f(z) = F’(x) be the initial density 
(distribution). Let n be a large sample size; let the rank m(m < < n) be counted 
from the top. Finally, let the parameters u,, and a», be defined as the solutions of 
(1) F(um) = 1—- m/n; an = nf (Um) /m. 


Then the asymptotic distribution ¢,(zm) of the mth largest value z., is [2] 


Qm(lm) = i Om exXp[—mym — me *™), 


where 
Ym = Om(Lm — Um), 


provided that the initial distribution is of the exponential type. The asymptotic 
distribution n¢(mz) of the mth smallest value is 


mO(mt) = m(—Zm). 
The probability function ©,,(z,,) is obtained from 


b, (tm) - om) 7 


exp[—my — me_*] dy 
- a) ce 2” 6 dz, 
whence 
(2) (2m) = 1 — I(tmn, m — 1), 
where 
tm = ~/me™ 


and J is the incomplete Gamma function ratio of Karl Pearson [5]. In the special 
case m = 2, the probability function of the penultimate value is 


(3) $:(z2) = 1 — I(n/2e?, 1). 
The modal penultimate value is, of course, wu, , and the intervals corresponding 





TABLE I 
Probability $2(y2) of the penultimate value y:2 


* | n ca) e | n 


.00001 .67935 89 | 3.05 
.00002 .69990 91 | 
.00004 .71954 91 

.00007 ; .73827 
.00013 .75 75608 
.00021 .77297 
00034 .78896 
00054 80406 
.00084 .81829 
.00128 83167 
.00189 84424 
.00274 85601 
.00389 .86703 
00542 87731 
.00742 88690 
.00998 89584 
.01322 .90414 
.01723 91185 
.02213 91901 
.02803 . 92563 
03503 .93176 
04324 .93743 
05274 94266 
06359 .94749 
07586 .95193 
.08958 .95603 
10477 .95979 
12141 .96326 
13947 96644 
15891 .96935 
17965 .97203 
.20160 .97448 
22466 .97673 
24871 .97879 
27362 .98067 
29924 .98239 
132543 .98397 
35206 .98540 
37898 .98671 
40601 .98791 
43305 98900 
45996 .99000 
48662 .99091 
51291 .99174 
.53873 .99249 
56399 .99318 
58860 .99380 
61249 .99437 
.63561 .99489 
65791 99536 


15 
.20 


& 


30 
35 
40 


ono 
CO mw 
ZRaS 


50 
55 
.60 
. 65 


S8Skus 
Seas 


e 


75 


38S 
z 


gS 
S 


85 
-90 
95 


Sas 


a> 
oo 


Re 
o 


.80 
-85 
-90 
-95 
.00 


2 
RESSSBSRBRBNBBSELSSPSSETSE 


_ 
~) 


= 
Lom 
ANA ANAaAaanaUrt ah hE EEE EEE ERE ER WWWWWWHWWWWWWW WWW Ww 


a 
= 
vo 
Oo 


SRB 


SCNWNNNNNNNNNNNNNN NNO 


o-oo os 


122 





s 
& 
S 
3 
3 
= 
& 
8 
i 


SECOND EXTREME 


TABLE II 
Probability poinis 
@:(x2) y 
-005 —1.31239 
.010 —1.19972 
-025 — 1.02454 
-050 —0.86371 
.100 —0.66519 
. 250 —0,29737 
.500 0.17534 


to the probabilities P;= 0.68269, P, = 0.95445, and P; = .99730 are y. = 
+0.75409, y: = 1.78196, yz = 3.27883, respectively. 

The present five-decimal table was computed by interpolation in Pearson’s 
table. The last six lines indicate the first values of y2 for which ®, differs from the 
value indicated by less than 5-10~*. The table was checked by differencing and 
by comparison with the short table of percentage points (Table IT) which was 


REDUCED VARIATE ys 
Ficure 1 





Se ede cae ie tat eae ee 


124 E. J. GUMBEL AND J. ARTHUR GREENWOOD 


computed by noting that 


2me ™™ 
has the x’ distribution with 2m degrees of freedom, and so transforming the 
percentage points given by Thompson [7] (pp. 188-189, line y = 4), setting 
yo = In4 — In’. 

More decimal places may be obtained by direct substitution in (3), by use 
of the relation 


(4) 2(z2) = #,(z) + ¢i(z), 


where z = y2 — In 2 and 9, and ¢, , respectively, the probability and density 
of the largest value, are given in a seven-decimal table originally calculated by 
Greenwood [4], and from the nine-decimal table of (x + 1)e™* by Miller and 
Rosebrugh [3], pp. 80-101, where 


x = 2¢%, 


Table I is basic for the construction of a probability paper (Figure 1) for the 
penultimate value which can be used in the same way as the probability paper 
of the largest value [6]. If the variate is replaced by its logarithm, the paper 
may be applied to the penultimate value taken from a distribution of the Cauchy 
type. Finally, if the probability ®, is replaced by 1 — #, , the paper may serve 
for the second value. 


REFERENCES 


[1] B. Epsrern, ‘Application of the theory of extreme values in fracture problems,’ Jour. 
Am. Stat. Assn., Vol. 43 (1948), pp. 403-412. 

[2] E. J. Gumpet, ‘‘Les valeurs extrémes des distributions statistiques,’’ Annales Institut 
Henri Poincaré, Vol. 4 (1935), pp. 115-158. 

[3] W. L. M1Luer anv T. R. Rosesrvuas, ‘Numerical values of certain functions involving 
e*,”’ Trans. of Roy. Soc. Canada, Ser. 2, Vol. 9, Sect. 3 (1903), pp. 73-107. 

[4] National Bureau of Standards, Tables Related to Extreme Values, in preparation. 

[5] K. Pearson, Tables of the Incomplete Gamma Function, 2nd re-issue, Cambridge Uni- 
versity Press, 1946. 

[6] S. E. Rantz anp H. C. Riaes, Magnitude and Frequency of Floods in the Columbia River 
Basin, Geological Survey Water Supply Paper 1080, Government Printing Office, 
1949. 

[7] C. M. TuHompson, ‘“‘Table of percentage points of the x?-distribution,’’ Biometrika, 
Vol. 32 (1941), pp. 187-191. 





DISTRIBUTION OF MAXIMUM DEVIATION 125 


THE DISTRIBUTION OF THE MAXIMUM DEVIATION BETWEEN TWO 
SAMPLE CUMULATIVE STEP FUNCTIONS 


By Frank J. Massey, Jr.! 


University of Oregon 


1. Summary. Let 1 < 2, < --: <x, and yw. < ye < --+ < ym be the or- 
dered results of two random samples from populations having continuous cumula- 
tive distribution functions F(x) and G(x) respectively. Let S,(z) = k/n when k 
is the number of observed values of X which are less than or equal to z, and 
similarly let Sa(y) = j/m where j is the number of observed values of Y which 
are less than or equal to y. 

The statistic d = max | S,(z) — Sa(zx) | can be used to test the hypothesis 


F(z) = G(x), where the hypothesis would be rejected if the observed d is sig- 
nificantly large. The limiting distribution of d / — . has been derived {1} 


and [4], and tabled [5]. In this paper a method of obtaining the exact distribu- 
tion of d for small samples is described, and a short table for equal size samples 
is included. The general technique is that used by the author for the single 
sample case [2]. There is a lower bound to the power of the test against any 
specified alternative, [3]. This lower bound approaches one as n and m approach 
infinity proving that the test is consistent. 

2. Distribution of d. Denote by a the number of observed values of Y which 
are less than x, , by a, the number of values of Y which are between x, and x, , --- , 
by an4: the number of values of Y which are greater than z, . It is known that, 
if the hypothesis F(z) = G(z) is true, the probability of the occurrence of any 
set of a, -** , @n4i is n'm!/(m + n)! Thus the probability that d < a can be 
found by counting the number of sets of a, ---, aay: which give values of 
d < a and multiply this number by n!m!/(m + n)! The method of counting is 
illustrated here for n = m, and some results are given in Table 1. If n = m 
then S,(z) and S;,(y) can only differ by multiples of 1/n. (If n # m they can 
only differ by multiples of 1/mn.) For integer k we count the number of sets 
of a, -°** , @ny; Such that d < k/n. 

Denote by U;(j),j = 1, 2,--:,n,7 = 0,1, 2,---,2k — 1, the number of 
sets of possible a; , a2, --- , a; such that Si.(z;) = (j + i — k)/n and such that 
| S,(x) — S'(x) | has been less than or equal to k/n for x < 2; . It is easily seen 
that these X;(j) satisfy the following difference equations. 


Uj + 1) = Udy) + UiG), 
OW + 1) eS Uo(3) + Ui) + U2()), 


Uno + 1) = U,(7) + a + Un-a(J), 
Unaj + 1) _ U.()) + ++: + Un-s(j). 
1 Research under contract N6-onr-218/IV with the Office of Naval Research. 





om rk WOW = 


“I 


aS Crm CO bo 


WwWwwwwn Wd bd Ww Ww Ww Wd te 
orwWNnNre OOO C - 


k=1 


.000000 | 
666667 
.400000 
. 228571 
. 126984 
069264 
.037296 
.019891 
.010537 
005542 
.002903 
001515 
.000788 
.000408 
000211 
.000109 
.000056 
.000029 
0148 
.0°761 
.0°390 
.0°199 
.0°102 
.0°52 
0°27 
0°14 
.0769 
.0735 
.0718 
.0°91 
0846 
. 0°23 
.0812 
.0°60 
.0°31 
.0°16 
0°79 
.0'°40 
01°20 
.0'°10 


FRANK J. MASSEY, JR. 


TABLE 1 


Probability of d < k/n 


.000000 
-900000 
771429 
642857 
525974 
424825 
339860 
269889 
213070 
167412 
131018 
102194 
079484 
061669 
047744 
036893 
028460 
021922 
016863 
012956 
009943 
007623 
005839 
004468 | 
003417 
002611 
001994 
.001522 
001161 
000885 
000674 
.000513 
000391 | 
000297 
000226 
000172 
000131 
.000099 
000075 





k=3 


1.000000 
.971429 
. 920635 
.857143 
.787879 
717327 | 
648293 
. 582476 
. 520850 
.463902 
411804 
364515 
321862 
. 283588 
. 249393 
. 218952 
191938 
. 168030 
. 146921 
. 128321 | 
111963 
.097600 
.085007 
.073980 
.064338 
.055914 
048563 
042154 
.036570 
.031710 
.027483 
.023808 
.020616 
017845 
015440 
013355 
011547 
009981 


k=4 


aanen 


.000000 
. 992063 
.974026 
.946970 
.912976 
874126 
832179 
788524 
744225 
.700080 
656680 
.614453 
573707 
.534647 | 
.497410 
.462071 
428664 
.397187 
367614 
339899 
.313983 
. 289796 
267263 
246303 | 
. 226833 
. 208772 
. 192037 
176546 
. 162223 
. 148989 
. 136773 
. 125505 
.115120 
105553 
.096747 
.088645 
.081195 


-000000 | 
.997835 
991841 
. 981352 
. 966434 
.947552 
.925339 
. 900453 
.873512 
. 845065 
.815584 
. 785465 
.755041 
. 724582 
-694311 
.664409 
.635020 
. 606260 
.578218 
550963 | 
.524546 
.499005 
.474362 
.450633 
.427823 
.405929 
.384946 
.364861 
.345657 
.327316 
.309816 
. 293133 
.277243 
262121 
. 247738 





. 999417 
-997514 
. 993706 
. 987659 
. 979261 
. 968564 
. 955728 
. 940970 
. 924536 
. 906674 
.887623 
. 867606 
.846827 
.825467 
. 803688 
. 781632 
- 759422 
. 737166 
- 714958 
. 692877 
-670992 
. 649362 
. 628036 
.607055 
. 586455 
. 566264 
. 546505 
.527198 
. 508355 
.489989 
.472107 
.454713 
437811 





DISTRIBUTION OF MAXIMUM DEVIATION 


TABLE 1—Continued 
ST er ee ae 





.000000 
.999845 | 1.000000 
.999260 | .999959 1.000000 | | 
.997943 | .999783 | .999989 | 1.000000 | 

.995634 | .999345 | .999938 | .999997 | 1.000000 | 

.992141 | .998503 | .999796 | .999982 | .999999 | 1.000000 
.987351 | .997125 | .999500 | .999938 | .999995 | 1.000000 
.981218 | .995100 | .998979 | .999837 | .999981 | .999999 
.973752 | .992344 | .998163 | .999647 | .999948 | .999994 
965002 | .988801 | .996985 | .999330 | .999880 | .999983 
-955047 | .984439 | .995389 | .998847 | .999762 | .999960 
.943982 | .979252 | .993331 | .998160 | .999571 | .999917 
931911 | .973251 | .990776 | .997233 | .999286 | .999844 
.918942 | .966458 | .987701 | .996033 | .998884 | .999729 
.905183 | .958911 | .984095 | .99453 | .99834 .99956 
.890738 | .950653 | .979953 | .99271 .99764 | .99933 
.875705 | .941731 | .975280 | .99055 | .99676 | .99901 
.860177 | .932197 | .970087 | .98803 | .99568 | .99860 
844240 | .922101 | .964389 | .98516 | .99438 | .99808 
827971 | .911498 | .958206 | .98193 | .99287 | .99744 
811443 | .900437 | .951562 | .97833 | .99111 . 99667 
.794722 | .888969 | .944481 | .97438 | .98911 . 99576 
.777865 | .877140 | .936989 | .97007 | .98686 | .99469 
.760927 | .864996 | .929113 | .96542 | .98436 | .99346 
.743955 | .852580 | .920880 | .96044 | .98160 | .9921 
.726992 | .839930 | .912319| .95514 | .97859 | .9905 
.710076 | .827086 | .903455 | .94953 | .97533 | .9888 
.693242 | .814080 | .894315 | .94363 | .97182 | .9868 
.676519 | .800946 | .884924 | .93745 | .96807 | .9847 
.659934 | .787713 | .875307 | .93101 .96407 | .9824 
.643512 | .774409 | .865487| .92432 | .95985 | .9799 
.627273 | .761059 | .855487 | .91740 | .95540 | .9773 
611234 | .747687 | .845327 | .91027 | .95074 .9744 
.595413 | .734313  .835029 | .90293 | .94587 | .9714 











128 R. M. REDHEFFER 


For small n these equations can be solved by iteration, which was done in 
constructing Table 1. Initial conditions an U;(0) = 1, U0) = 0 fori ¥ k. 
It might be noted that the U;(j + 1) are subtotals of the U;(j) so that the itera- 
tion proceeds very rapidly on an adding machine. The probability that d < k/nis 
[Uo(n) + Ui(n) + U2(n) --+ + Un(n)|nin!/(2n)!. 


REFERENCES 
{1] W. Fevier, “On the Kolmogorov-Smirnov theorems,’’ Annals of Math. Stat., Vol. 19 
(1948), pp. 177-189. 
[2] F. Massey, ‘‘A note on the estimation of a distribution function by confidence limits,”’ 
Annals of Math. Stat., Vol. 21 (1950), pp. 116-119. 


[3] F. Massey, “‘A note on the power of a non-parametric test,’”’ Annals of Math. Stat., Vol. 
21 (1950), pp. 440-443. 

{[4] N. Smrrnov, “On the estimation of the discrepancy between empirical curves of dis- 
tribution for two independent samples,’”’ Bulletin Mathématique de l’Université 
de Moscou, Vol. 2 (1939), fase. 2. 


[5] N. Smirnov, “Table for estimating the goodness of fit of empirical distributions,” 
Annals of Math. Stat., Vol. 19 (1948), pp. 279-281. 


0 RR a 


A NOTE ON THE SURPRISE INDEX 


By R. M. REepHEFFER 


Harvard University 


Let pu(m = 0, 1, 2, --- ) be a set of probabilities of events Z,, , and suppose 
that the event EZ; , with probability p;, actually occurred. Is the fact that E; 
occurred to be regarded as surprising? In a recent article [1] this question is 
answered by introducing the surprise index S;, 


(1) S; = (Zpa)/pi, 


which gives a comparison between the probability expected and that actually 
observed.’ The event is to be regarded as surprising when S; is large. 

The author remarks on the difficulty of computing (1) for the Poisson and 
binomial distribution. The problem consists in evaluating the numerator, which 
we shall express here in terms of tabulated functions. The Poisson case leads 
to Bessel functions, the binomial case to Legendre or hypergeometric functions, 
and the asymptotic behavior involves square roots only. 

1. The Poisson case. For the Poisson case we have pm = "e*/m! so that the 
generating function is 


(2) ee* = Lar”. 


Let x = e”, then e~”; multiply; integrate from 0 to 2x; and simplify slightly to 
obtain 


(3) Zp = (/x) | " geow ap 


1 Cf. also [6]. 





SURPRISE INDEX 129 


The integral on the right is recognized’ as the zero-order Bessel function [2] 
so that we have 


(4) Dp2 =e Jo(—2id) = e Io(—2) 
as the final answer. The relevant tables are listed on pages 271, 272, and 275 of 


[5}. 

2. The binomial case. When pm = Crp"q” ” with gq = 1 — p, the value of 
=p>, for p = q = } is given in the literature (3); it is the product of the first n 
odd integers, divided by the product of the first n even integers. For general p, 


(5) (q + pr)" = Zpar™ 


is the equation corresponding to (2). Following through the derivation of (3), 
we get 


(6) =p, = t. 


2r 


2r 
[ (p* + 2pq cos 6 + q’)” do 
which is recognized as the n“ order Legendre function [4], 
pt+g¢ 
(7) spi = |p - ol*P.( |Z +#)) (p # q). 
LP. ¢ 
For tables see [5], pages 232-235, 242. 


The result (6) is also expressible as a hypergeometric function, and this with- 
out intervention of (7). The change of variable u = p* + 2pq cos @ + 7 leads to 


(8) Epn = (1/n) [ u™(u — a)" — uy” du 


with a = (p — q)’, and letting u = a + (1 — a)z gives an integral which turns 
out to be [4] 


Zpn = (p — q)’"Fl[—n, 4; 1; —4p9q/(p — 9)’). 


It was brought to the author’s attention, by Weaver himself via Mosteller, 
that (7) is given in Pélya-Szegé, Vol. II, p. 92. There, however, the point of view 
is to evaluate the integral rather than the sum, and hence the above derivation 
is the more natural here. 

3. Approximation. For large values of , (4) gives [2] 


1 
3 re —* 
To obtain the asymptotic behavior in the binomial case, note that if the limits 
of integration in (8) were 0 — 1, and if the factor (u — a)~“” were absent, we 


should have the Beta function B(n + 1, 4). Because u” emphasizes the region 


2 This connection between (3) and the Bessel function was pointed out to the author by 
by M. V. Cerrillo of M. I. T. 





130 BURTON H. CAMP 


near u = 1, this resemblance may be exploited to give (after elementary but 
tedious calculations) 


(10) r=p, = Bin + 1,3) +e 


with 
0 <e < 2e™ + (3)[8/(1 — a)” 


whenever n > a/(1 — a). Here 6 is any number < pg. Picking 6 = n™’, @ < 1, 
shows that the error goes to zero almost as fast as n™”’”. A similar result may be 
obtained by the methods of Uspensky. 

From (10) we have easily 


(11) Zpn ~ 1/(2V xnpq) (n — ©), 


which is correct even for p = q. 


It was pointed out by the referee that (9) and (11) are special cases of the 
relation 


Dp, ~ (4) variance 


which generally holds whenever the shape of the distribution curve approaches 
a limit. 
REFERENCES 
[1] W. Weaver, “Probability, rarity, interest, and surprise,’’ The Scientific Monthly, Vol- 
67 (1948), p. 390. 
[2] JAHNKE AND Empks, Tables of Functions, 3rd rev. ed., Dover Publications, 1943, p. 149, 
p. 117. 
{3} Hatt anp Knicut, Higher Algebra, Macmillan Co., 1936, p. 148. 


(4) WatrakeR AND Watson, A Course of Modern Analysis, 4th ed., Cambridge University 
Press, 1940, p. 312, p. 293. : 

[5] FLevcHer, MILLER AND ROSENHEAD, An Index to Mathematical Tables, Science Com- 
puting Service, Ltd., London, 1946. 

(6) T. C. Fry, Probability and Its Engineering Uses, D. Van Nostrand Co., 1928, pp. 199-200. 


Ce nn 


APPROXIMATION TO THE POINT BINOMIAL 
By Burton H. Camp 
Wesleyan University 
The following approximation to the sum of the first (¢ + 1) terms of the 


point binomial appears to be useful. Let this sum be denoted by S;,: , and let 
the point binomial be the expansion of (p + q)*; i.e., let 


(1) Sur = p’ + Np*"q + --- + (") p*~‘q'. 





APPROXIMATION TO THE POINT BINOMIAL 131 


Then 8,4; is approximately equal to the probability that a unit normal deviate 
will exceed x, where 


| (= gy" Bt 8] 
(2) he a t+ 


zx \ . 2 ’ 
1 1 1/2 
E ee a) 8 =i] 


This approximation is a corollary to an approximation given by Paulson [1] 
to the table of the integral of Snedecor’s F (Fisher and Yates’ w = e”), and the 
known facts that this integral is an incomplete Beta-function [2] of a simple 
transform of F, and that S,,; is also an incomplete Beta function of suitable 
arguments. Paulson’s approximation appeared to be quite close. Since it was 
essentially an approximation to the incomplete Beta function we must now have 
a similarly close approximation to the point binomial. Therefore two illustrations 
will suffice. 


s=N—t. 





Example 1. (.8 + .2)8 Example 2. (.9 + .1)* 


wes Sas a | Error 


Approx. True Approx. True | 
seatetitiesees tilted int i tall I a sihideneeslieenhiiaiecescht j 


. 166 . 168 ‘ | 005 -005 .000 
-505 Te ot | .033 .034 | — .001 
.801 at | .250 . 250 .000 
9438 .944 : 617. .616 | .001 
999 999 | . | 992 .991 | .001 
Both these examples involve strongly skewed distributions, one with a small 
value of N and the other with a fairly large value of NV. Considering the amount 
of computation involved this approximation is much more satisfactory than 
any other in this author’s experience. 


REFERENCES 


{1] E. Pautson, ‘An approximate normalization of the analysis of variance distribution,” 
Annals of Math. Stat., Vol. 13 (1942), p. 233. 
{2} S. S. Witxs, Mathematical Statistics, Princeton University Press, 1943, p. 115. 





132 C. CHANDRA SEKAR 


A THEOREM ON THE CORRELATION COEFFICIENT FOR SAMPLES 
OF THREE WHEN THE VARIABLES ARE INDEPENDENT 


By C. Caanpra Sexar' 
All India Institute of Hygiene and Public Health, Calcutta 


In this note the following theorem will be established: 

THEOREM. If (x;, yi) fori = 1, 2 and 3 denote three pairs of random values of 
two independent continuous stochastic variables x and y, r, their correlation co- 
efficient, ts given by 


1 3 
> (x: — (yi — 9), 


38, 8y tml 


(1) r 


(2) ; 
8; = 32 (x; — 2)’, 


t=1 


and P(a < r < b) denotes the probability of r taking values in the range a < r <b, 
then 


(3) P( —1 


Proor. If 7; is defined by 
(4) i = 1, 2, 3, 


it is readily seen that the three values of 7 are connected by the two relations 


3 3 
(5) > + = 0, > 73 = 3. 


t=1 t=] 


Similar conditions exist between the three ¢,’s defined by 


(6) bw oe, i = 1, 2,3. 
8y 

The set (71, 72, 73) can be considered as the Cartesian coordinates of a point 

in three dimensional space. The conditions (5) restrict the point to a circle. The 

set (t, , fe , ts) defined by (6) represents a point on the same circle. The correlation 
coefficient, r, defined in (1) and also given by 


(7) r= 43> 1k 


t=1 


On loan to Population Division, United Nations. 





A CORRECTION 133 


can be regarded as the cosine of the angle @ between the lines joining (7; , t2 , 73) 
and (t; , fe, ts) respectively to the centre of the above-mentioned circle. 
The relationships between the 7,’s given by (5) make it necessary for one 


value of the r;’s to occur in each of the three non-overlapping intervals —+/2 to 
1 


-F5 oe to a and a to »/2. Exactly the same conditions hold for 
the t,’s” 

The 6 permutations of 7; , rz , 73 in these three intervals correspond to a sub- 
division of the circle on which the point (7: , 72, 73) lies into 6 equal arcs of 60° 
each. Every point on any one of these arcs can be shown to correspond, one to 
one, to the position of 7; in any one of the intervals; also proceeding along the 
circle, points on three alternate arcs correspond to the positions of 7; as it takes 
on values from the highest to the lowest in this interval and points on the other 
three correspond to the positions of r; as it moves from the lowest to the highest 
value. 

It is clear that when adjacent ares are combined in pairs dividing the circle 
into 3 equal arcs of 120°, the probability density function of (7; , 72, 73) is the 
same on the 3 arcs and is symmetric on each. At any three points on the circle 
which divide it into three arcs of 120°, the probability density function of 
(7; , T2, 73) is therefore the same. The same conditions hold for (t; , t , t). 

It therefore follows that 


T © 2r T T Qn 
wit wh ee BS eae = ee 
= P(-2 <e<?) P( 3 <9S Tort <o< *) 


2r 2r 
és ay LL. = — <Oe< ; 
P(-» @< 3 83 0< +r) 


(ee 


CORRECTION TO “THE DISTRIBUTION OF EXTREME VALUES 
IN SAMPLES WHOSE MEMBERS ARE SUBJECT TO A 
MARKOFF CHAIN CONDITION” 


By BenJAMIN Epstein 
Wayne University 


In the paper mentioned in the title (Annals of Math. Stat., Vol. 20 (1949), 
pp. 590-594) I claim to have proved a number of results dealing with the dis- 
tribution of extreme values in samples of size n drawn at equally spaced intervals 
from a stationary Markoff process. As Professor W. Feller has kindly pointed 


? This property has been utilised by the author and 8S. C. Bhoumik to obtain distributions 
of the correlation coefficient for samples of three, under varying assumptions regarding the 
distributions of independent variables z and y. The distribution of r; or t; is also of help 
in working out the distribution of Fisher’s g; for samples of three. For the distribution of 
g: for samples of three from continuous rectangular distribution, refer to C. Chandra Sekar 
in Current Science, Vol. 13 (1944), pp. 10-11. 





134 ABSTRACTS 


out to me in personal correspondence, this is actually not the case. However, 
the theorems and their proofs remain completely valid in their present form if 
the observations are drawn from a stochastic process satisfying condition (5) of 
the paper. This chain condition states that the process be such that 

Prob (X, < z| Xi < 2, X2< 2, +--+, Xan < 2) = Prob (X, < z| Xa. S DD 
is satisfied for all z and for all positive integers n. 


Os RR 


ABSTRACTS OF PAPERS 
(Abstracts of papers presented at the Chicago meeting of the Institute, December 27-29, 1960) 


1. Cost Functions for Sample Surveys. (Preliminary Report). Garnet E. Mc- 
Creary, University of Manitoba and Iowa State College. 


Assume: (1) one travels in a rectangular (grid) fashion rather than straight line (air- 
line) path, (2) n random points have a uniform distribution over the region or stratum. 
Moderate changes in shape of regions have a minor effect on expected distance. Mean air- 
line distance can be prediced from mean grid distance fairly accurately. The following for- 
mulas are derived: (1) expected minimum grid distance for n = 3 in a square, (2) an upper 
bound to expected minimum grid distance for all n, (3) expected grid distance for a strati- 
fied and unstratified sample, if the path among the points does not reverse a certain direc- 
tion, (4) expected distance of a random point from (a) the center of the arc of the circle, 
semicircle or quadrant, (b) any fixed point, inside or outside the rectangular region, (5) 
mean square distance between any pair of points adjacent in a clockwise direction (6.7 to 
9.5 per cent biased upwards over corresponding mean airline distance). Certain conclusions 
are drawn regarding the most efffcient design with respect to total distance. Detailed mile- 
age records of three Iowa farm surveys were compared with theoretical estimates. If the 
cost is balanced against the losses resulting from errors in estimate, for a particular design, 
the problem of determining sample size is broached by using Wald’s minimax principle. 


2. On a Preliminary Test for Pooling Mean Squares in the Analysis of Variance. 
A. E. Pau, Abitibi Power and Paper Company, Limited, Toronto, Canada. 


The consequences of performing a preliminary F-test in the analysis of variance is 
described. The use of the 5% or 25% significance level for the preliminary test results in dis- 
turbances that are frequently large enough to lead to incorrect inferences in the final test. 
A more stable procedure is recommended for performing the preliminary test, in which the 
two mean squares are pooled only if their ratio is less than twice the 50% point. 


3. Estimation for Sub-Sampling Designs Employing the County as a Primary 
Sampling Unit. Emit H. Jese, Iowa State College and North Carolina State 
College. 


This paper summarizes a study of the application of various two-stage designs including 
the estimation procedures for providing state estimates of agricultural items in North 
Carolina. Among the principal objectives of the investigation were (1) the comparison of the 
efficiency of selection of the primary units with equal and with unequal probabilities, and 
(2) assessment of the relative contributions of the between primary sampling unit and 
within primary sampling unit error components to the total sampling error. Examination 


of several linear and ratio estimates indicates a number of advantages for a particular 
ratio estimate. 





ABSTRACTS 135 


4. The Probability Distribution of the Number of Isolates in a Social Group. 
Leo Katz, Michigan State College. 


Each of the N members of a well-defined social group is asked to name d others with whom 
he would prefer to be associated in some specified activity. Under the null hypothesis, his 
choices are randomly distributed. An isolate is an individual who is not chosen by any of 


the other members of the group. The probability of exactly 7 isolates in the group is then 
given by 


P,; = 2S (—1)*1CG, DCW ICW — i, OKICW — 1 — i, d)*“(CW - 1, 01-*, 


where C(N,n) = yC, , the binomial coefficient. This expression for P; is somewhat unwieldy. 
It is further shown that this probability function is asymptotically a binomial p.f., P{ = 
C(n, i)p*(1 — p)**, where 


p=N((N —1—a@/(N — DP -— (VN —- DW -1-A/N — DIN - 2 - D/N - 2) 
and np = N{(N — 1 —d)/(N — 1)]*—. The approximation is very good even for mod- 


erately small values of N. 


5. Estimating Population Size Using Sequential Sampling Tagging Methods. 
Leo A. GoopMaNn, University of Chicago. 


Let [n;] be a sequeice of positive integers and let S(L, n;) denote the procedure where- 
by (1) m: elements are drawn at random from a population P, then tagged to distinguish 
them from the remaining elements, and replaced in P, (2) nz elements are drawn from P, 
the number of tagged elements appearing is observed, the nz elements are then tagged and 
replaced in P, (3) --- , this process is halted when at least L > 0 tagged elements have 
appeared. Given S(L, n;), there exists a minimum variance unbiased estimator (m.v.u.e.) 
of the number N of elements in P which may be determined as the quotient of two deter- 
minants and simplified, by combinatorial methods, in special cases. If [n;] is bounded, as N 
approaches infinity, the limiting distribution of #/N, where t is the total number of elements 
drawn before the procedure ceases, is x* with 2L degrees of freedom. Hence the asymptotic 
m.v.u.e. of N, confidence intervals and tests of hypotheses for N may be obtained as well 
as the approximate fiducial distribution of N. Similar results may be obtained for the more 
general cases where (a) information concerning size of some subclasses in P is used and (b) 
where taggings may or may not be differentiated. The S(L, n;) compares favorably with 
other procedures considered. 


6. Application of the Distribution of a Linear Form in Chi-square Variates. 
ArtTuur Grap AND HERBERT So.tomon, Office of Naval Research, Washing- 
ton, D. C. 


The probability of hitting a target depends both on the accuracy with which the position 
of the target is known and the dispersion of the weapon about the point of aim. Under the 
assumption that each of these errors has a bivariate Gaussian distribution with known co- 
variance matrix, || ¢(p) || for position prediction error and || «(a) || for aiming error, about 
the point of aim (predicted position), the probability, P, of hitting the target with a weapon 
having a radius of effectiveness R is shown to be P = Pr{kiz’ a kia: s R*/C*}, where 
k= lou(p) + on(a)]/C?, kz = (or2(p) + o22(a)]/C?, C? = o1(p) + on2(p) + ou(a) + on(a), 
ant 2; is a chi-square variate with 1 degree of freedom. When oi2(p) = o1:(a) = 0, then the 
chi- Square variates are independent. If not, a linear transformation exists such that z = 
kizt + kz = yi + Uy: , where [) + j= ki +k} and y; and y; are independently dis- 





136 ABSTRACTS 


tributed chi-square variates each having one degree of freedom. It is then demonstrated 


t 
that P = 2kike [ e~*Io[z(1 — 4kik3)*] dz, where t = R*/4C*kik; , when the chi-square 
0 


variates are independent; in case of dependence, k; should be replaced by /; . A table was 
constructed which covers the entire range of the parameters. 


7. A Large Sample t-statistic which Is Insensitive to Nonrandomness. Jonny E. 
Watsu. The Rand Corporation. 


Most of the well known significance tests and confidence intervals for the population 
mean are based on the assumption of a random sample. This paper considers how the sig- 
nificance levels and confidence coefficients of the commonly used class of tests and intervals 
based on the standard Student t-statistic are changed when the random sample requirement 
is violated and the number of observations is large. It is found that even a slight deviation 
from the random sample situation can result in a substantial significance level and con- 
fidence coefficient change. Thus this class of tests and confidence intervals would seem to 
be of questionable practical value for large sets of observations. Large sample tests and 
confidence intervals for the mean which are not sensitive to the random sample requirement 
are obtained for a situation of practical interest by development of a special type of t-statis- 


tic. These results are as efficient (asymptotically) as those based on the standard t-statistic 
for the case of a random sample. 


8. Conditional Expectation and Convex Functions. E. W. BAaRANKIN, Univer- 
sity of California, Berkeley. 


The inequality Ey(E(f|-)) s Ey({), (where the conditional expectation is taken with 
respect to a function ¢t) with f a real- (or complex-) valued function on the fundamental 
space, was shown by Blackwell to hold in the case ¥(z) = | z |*, and by the present author 
to hold in the case y(z) = | z |*, s = 1 (Annals of Math. Stat., Vol. 18 (1947), pp. 105-110, 
and Vol. 21 (1950), pp. 280-284, respectively). More recently Hodges and Lehmann (Annals 
of Math. Stat., Vol. 21 (1950), pp. 182-197) proved the inequality in the case of f a function 
to & (Euclidean k-space) andy a finite, convex, real-valued function on &*. Now, both Black- 
well and this author exhibited the above inequality, in their cases, as (obvious) conse- 
quences of the more fundamental relation: y(E(f | 7r)) S E(y(f) | r) for almost all points 
t in the range of t. The work of Hodges and Lehmann, however, leaves open the question 
whether or not the latter inequality holds in the more general case. In the present note this 
almost-everywhere inequality is established for f to &* and y convex. The first inequality 
then obtains by integration. 


9. Transformation Parameters. MELvin P. Petsaxorr, The Rand Corporation. 


Location, scale, and location-scale parameters are examples of transformation parameters. 
Transformation parameters are defined by: (1) the parameter space is a group, (2) the 
sample space can be factored into the same group and an arbitrary space, (3) the random 
variable associated with each parameter point, 6, can be generated by drawing from the 
population associated with the unit of the parameter space and left multiplying the group 
component of the sample by 6. Decision function theory is investigated when the decision 
space and the cost function are of a special intuitively appealing form. The formulation is 
broad enough to include sequential analysis. Minimax decision functions are found. Also 
investigated is testing and confidence region theory, using extensively the results on decision 
functions. Both simple and composite hypotheses are treated. Finally, (Fisher) information 
theory is examined. It is shown that modifications are necessary if information theory is 
to be useful in estimation problems. One modification is suggested. This modification en- 





ABSTRACTS 137 


larges the class of standard estimators to include each estimator which is minimax with 
respect to a certain risk function determined by the estimator itself. The approach is gen- 


eralized to include inequalities for the mean square error other than the information in- 
equality. 


10. A Generalization of the Neyman-Pearson Fundamental Lemma. Henry 
Scuerrk, Columbia University. 


Given m + n real integrable functions f;(z), «++ , fm(x), Aa(z), +--+ , ha(z) of a point z 
in a Euclidean space R, a real function ¢(y: , «+: , Ya) of n real variables, and m constants 
C1, °** »¢m, the problem is to consider the existence of, and to find necessary conditions 


and sufficient conditions on, a set S maximizing ¢ (f hi dz,++-, [ h, iz) subject to the 
8 Ss 


m side conditions [s dz = c; . In some applications the values of the vector 
8 


(frases [ih 9) 


n 
may also be restricted to a given set. A statistical example in which o(y: , ---, yn) = OL y; 
i=l 
arose in an unpublished paper of Isaacson. The methods of the present paper are sug- 
gested by those of an unpublished paper of Dantzig and Wald. Under certain regularity 
conditions the inequalities appearing in the Neyman-Pearson lemma are replaced by 


n m 


= a°h;(z) — 2 kjfj(z) > 0 (ae. in S), < 0 (a.e. in R — S). Here a. and k; are constants 
i=] j=l 


with a, = d¢/dy; evaluated at (y:1,°-- , Yn) = (/ hi dz,---, [ ha az). 
8 8 


11. Nonparametric Estimation V, Sequentially Determined Statistically Equiv- 
alent Blocks. D. A. S. Fraser, University of Toronto. 


In 1943 Wald gave a method for constructing tolerance regions for continuous multi- 
variate distributions. Tukey generalized Wald’s procedure and then interpreted the results 
for discontinuous distributions. In this paper a further generalization of the method is 
given by which statistically equivalent blocks can be determined sequentially; that is, 
the particular function used to cut off a block may depend on the shape or structure of 
previously selected blocks. The results are also extended to the case of discontinuous dis- 
tributions. Possible advantages for the practitioner are discussed. 


12. A Bayes Approach to a Quality Control Model. M. A. Grrsuick anp HeEr- 
MAN Rus, Stanford University. 


A machine producing items of quality characteristic z can be in one of four states. In 
state i = 1, 2 the machine is in production and is characterized by a density f;(z). In state 
j = 3, 4 the machine is in repair having come from state 7 = 2. When the machine is in 
state 1 there is a probability g that in the next time unit it enters state 2, remaining in 
state 2 until brought to repair by some rule R based on observations. The income from 
items of quality z is V(x); repair cost per unit time in state j = 3, 4 is c;. A rule R* is 
Bayes if it maximizes lim Jy as N — ~ where Jy is the expected income per unit time in 
N time units. It is proved that for 100% inspection, R* states that sampling is to continue 
as long as Z, < a and sampling is to terminate and the machine placed in repair when 





a OREN AE LIT PME AI 


138 ABSTRACTS 


Z, > a, where Z, = yn(1 + Zn-1), Zo = Oand y, = f2(z,)/(1 — g)fi(z,). R* is also obtained 
in case inspection costs are taken into account. It is shown that the above Markoff process 
approaches a stable distribution and the required integral equations are derived. 


13. On the Translation Parameter Problem for Discrete Variables. Davin 
BLACKWELL, Stanford University. 


Let z = (xz; , --: , tw) be a vector chance variable, let y = z + he, where « = (1, --- , 1) 
and h is an unknown constant, and let ¢ = t(y) be any function of y, considered as an esti- 
mate for h when y is observed. Let f(d) be any function of a real variable d, considered as 
the loss to the statistician when the error of estimate is d, so that the risk from an esti- 
mate t is Ri(h) = Ef[h — t(x + eh)]. Extending the work of Pitman, Girshick and Savage 
have exhibited an estimate {* for which R,*(h) = R independent of A, and have shown that 
i* is minimax. It is shown here that if z assumes only a finite number of values 
vy = (na , +++ , Mew) and each nj; is an integer, and if f(d) is strictly convex and assumes its 
minimum value, then ¢* is admissible and is in fact the unique minimax estimate. Two ex- 
amples in which ‘* is not admissible are given. A closely related fact is that if S is a closed 
bounded strictly convex subset of n-space intersecting the line x; = --- = z, at the single 
point (w, --- ,w), then the only sequence {z,,.!, — » <m < ~, for which P,, = (Zm4i, -*° 
Zman) © S for all m is z,, = w for all m. 


’ 


14. On Ratios of Certain Algebraic Forms. Ropert V. Hoce, State University 
of Iowa. 


Let z and y be random variables having a continuous cumulative distribution func- 
tion, and let M(u, t) = Elexp (ux + ty)] exist in the neighborhood of the origin of the u, t¢ 
plane. Subject to certain conditions a necessary and sufficient condition for the stochastic 
independence of y and 2z/y is (0*/du*)M (0, t) = K,(0*/at*)M (0, t)(k = 0,1, 2, --- ), where 
K;, is evaluated by setting ¢ = 0. This result is used in the study of certain ratios of quad- 
ratic and linear forms. In dealing with the quadratic forms, the sample arises from a nor- 
mal population with mean zero. A necessary and sufficient condition is determined for 
the stochastic independence of Q. and Q,/Q: , where essentially Q; = az} + eee + az, 
and Q, = biz + oes + bat, . In the linear case however, the distribution is unspecified. 
Then it is found that the requirement of the stochastic independence of L, and L,/L2 im- 
plies that the sample arose from a gamma type distribution. Here L; = a;z; + --- + G,2q 
and Ll; = 1+ °-:-+2,. 


15. The Economics of Sampling. Norman Rupy, Sacramento State College. 


An optimum single sampling plan for acceptance inspection of attributes is developed 
by the method of minimizing the maximum risk. The first application is to warehouse or 
surveillance inspection, in which the value of a good item, g, and the cost of a bad item, b, 
define a breakeven quality, po . It is shown that under these conditions, and with sampling 
cost a linear function of sample size, s, tn, the optimum sample size is approximately equal 
to [(.085 lot size) /t}** (bg)'*, the optimum acceptance number is approximately equal to 
npo , and the minn,. max, of the risk is approximately equal to s + .58(tbg)'/* (lot size)*”*. 
The more general case, where the breakeven quality po is determined by trade practice or 
contract, is also worked out, but cannot be presented in completely analytic 
form. A simple table involving the quotient of the normal integral and the normal density 
is required. Given this and the cost parameters of the situation, then the sample size and 


the acceptance number which minimize the maximum risk are determined from relatively 
simple expressions. 





ABSTRACTS 139 


16. Exact Tests of Serial Correlation Using Noncircular Statistics. G. 8. War- 
son, University of Cambridge, AND J. Durstn, London School of Economics. 


The paper shows how noncircular statistics for testing hypotheses of serial independ- 
ence may be constructed for which exact distributions can be obtained using results 
given by R. L. Anderson (‘Distribution of the serial correlation coefficient,’’ Annals of 
Math. Siat., Vol. 13 (1942), pp. 1-13). The statistics are derived by throwing away a small 
amount of relevant information. As an example the statistic 


Cy = (LiF. + +++ + Smite + Imeitme2 + °° + tami) // 


may be used for testing independence in a series of 2m observations whose mean is known 
to be zero. The quadratic form in the numerator of c; is based on a matrix whose roots are 
pair-wise equal, so that the distribution of c,; when the z’s are normal with the same vari- 
ance is known from the results of R. L. Anderson. Tests of the errors in certain regression 
models may be made by fitting separate regressions to the two halves of the series and 
substituting the residuals in expressions similar to c; . Exact tests can be obtained in this 
way for polynomial regressions, one-way, two-way etc. classifications, and periodic regres- 
sions. The statistics appear to have power comparable with that of the related circular 
statistics against alternative hypotheses specified by a stationary Markoff process. In 
many cases occurring in practice, however, serial correlation of the errors will be due to 
systematic behaviour arising from the inadequacy of the theoretical model to represent 
the true relationship. The statistics proposed will often be preferable to circular statistics 
in such cases. 


17. Stochastic Difference Equations with a Continuous Time Parameter. (Pre- 
liminary Report). S. G. Guurye, University of North Carolina. 


Given a discrete sequence of observations ordered equidistantly in time, it is often 
assumed that this discrete process is explained by a stochastic difference equation with a 
purely random ‘‘disturbance’’. However, this observed discrete process might be the result 
of observations on a stochastic process X(t) in which ¢ is not discrete, but continuous. Is 
it possible to have a process X(t), defined for ¢ real, such that given any real t) and any 
real h > 0, the sequence {X(t + jh)}, 7 = 0,1, --- , satisfies the equation 


X (to + {j a UD) + a;(h)X (to + {7 TPp- 1Jh) tee te a,(h)X (to + jh) = 5(t9 + jh), 


6 being a linear function of mutually independent random variables having a common 
c.d.f. which is independent of h? The cases p = 1 and p = 2 are dealt with in detail, and 
the possible forms of such processes derived; the further problem for any p, as also for a 
system of equations, is being considered. It is also proposed to tackle the problems of 
estimation and testing which arise in this connection. 


18. Nonsequential Problems in the Case of k Hypotheses. (Preliminary Report). 
HERMAN CHERNOFF, University of Illinois. 


Suppose that there are k possible simple hypotheses H, , H;,--- , Hx and a possibly 
infinite set of actions may be taken. To a decision function there corresponds a vector 
p = (pi, p2,°** , pe) Where 9; is the risk if H/; is true. The closure of the range of p is con- 
vex in the nonatomic case and in the randomized case. In the randomized case the closure 
of the range of p is the convex hull of the closure of the range of p in the nonrandomized 
case. (The randomized case is that one where a, number is selected at random from the 
unit interval before an action is taken.) The range of p is closed under suitable closure 
conditions on the range of the weight function. 





140 ABSTRACTS 


19. The Moments of a Multinormal Distribution after One-sided Truncation of 
Some or All Coordinates. Z. W. BrrNBAUM AND Paut L. Meyer, University 
of Washington. 


Let X = (X,, X:,-+-: , Xp) be a multinormal random variable with given first and 
second moments and the probability density f(X: , X:,--- , X»). The random variable 
Y = (Y:, Y:,--- , Yp) is said to be obtained from X by truncation to the set X; > 7; , 
i = 1,2,--- , p, if its probability density is g(Y, , Y2,---: , ¥Yp) = Cf(¥i, Y2,--- , Yp) 
for ¥; >7,¥2>72,-°:+,¥p>r7p,andg(¥i, Y2 ,--: , ¥Yp) = O elsewhere. The prob- 
lem considered is to determine the mathematical expectations E(Y; Y;). Explicit formulae 
are obtained for the first and second moments E(Y;) and E(Y;Y;), and recursion formulae 
are given for the general case. (Research done under the sponsorship of the Office of Naval 
Research.) 


20. An Algorithm for the Determination of all Solutions of a Two-Person Zero 
Sum Game with a Finite Number of Strategies. H. Rarrra, G. L. THomp- 
SON, AND R. M. Tura, University of Michigan. 


Consider a zero-sum two-person game in which each player has a finite number of strat- 
egies. A computational procedure is given for finding the value of the game and all opti- 
mal basic strategies for each player. The basic computations required are evaluation of 
linear forms and solution of linear equations in one unknown. This method, based on 
geometric reasoning, is a step by step process with no more stages than the total number 
of strategies for the two players. 


21. A Note on the Convolution of Uniform Distributions. Epwin G. OLps, 
Carnegie Institute of Technology. 


Let X; be independent random variables with probability density functions [e(X;) — 
e(X; — a;)]/a; , where «(x — c) is unity for z > ¢ and zero elsewhere. This paper gives a 
simple proof that the probability density function for S = 2} 2; is 


[S*-e(S) — z; (S — a;)"e(S — a;) + Bie; (S— a — a;)""e(S — aj — a;) 


— +++ + (—1)"(S — Za;)*"e(S — Za;)]/(n — 1) IM a. 


A sufficient condition for the asymptotic normality of S is 0 < a < a < 8B (finite). For 
the special case where a;,; = ra; the necessary and sufficient condition for asymptotic 
normality is r = 1. For 0 < r < 0.5 or r > 2 the probability that S will be outside the 
interval us + 3eg is zero. From the Edgeworth Series for the distribution function for the 
standardized sum it follows that F(— 3) = 0.00135 — 0.004[2a! /(Zai)*] where the bracketed 
expression takes its minimum value n~! when all of the a,;’s are equal. These results are 
useful in connection with the problem of random assembly. 


22. On the Consistency of Certain Estimates of the Linear Structural Relation. 
ExizaBetu L. Scortr, University of California, Berkeley. 


Let {z; , ys} denote n independent pairs of observations on z, y where z = & + u and 
y = a + BE + v with &, u and v random variables with finite variances, H(u) = E(v) = 0 
and ~ independent of the pair u, v. Procedure (1): Fix a S$ b such that 


Pixs a}>0, Piz > b} >0. 


Let X: , Y: stand for the arithmetic mean of the z,’s and y;’s, respectively, for x; S a and 
X:, Y2 for those for which z; > b. As an estimate of 8, consider, say, bi = 





ABSTRACTS 141 


(Y2 — ¥:)/(X2 — Xi). Procedure (2): Let X;1 , ¥; denote arithmetic mean of 2z;’s and y;’s, 
respectively, for which z; is one of the r smallest of the 2;’s and X2 , Y: for those for which 
z; is one of the s largest, with r, s preassigned, r < n — s + 1. The corresponding esti- 
mate of 8 is, say, b: defined as above. Let (u, v) denote the shortest interval 
such that P{u S u Ss v} = 1. THeorem 1. In order that b, preserve the property of being a 
consistent estimate of 8 irrespective of the value of B, —-~ < B < @, it is n.a.s. 
that Pija-—»v<tSa—p} = Pib—vr<tsb— yp} = 0. Now letr = pin, 8s = pm and 
m, M be the corresponding percentile points such that P{té S m} = p, and Pjé > M} = 
Pp: . THEOREM 2. If n> © while p; and p: are held constant, the n.a.s. condition that bz pre- 
serve the property of being a consistent estimate of 8 irrespective of the value of B, ~~ <B< ~, 
is that Pim — » << ES m— yp} = PiM—»< ts M — p} = 0. Similar estimates were 
considered, for p; = p2 = 4, u and v independent, by A. Wald (Annals of Math, Stat., Vol. 
11 (1940), pp. 295-297) who showed sufficiency. 


23. A 3-decision Problem Concerning the Mean of a Normal Population. R. R. 
Banapur, University of Chicago. 


Given n independent observations 2; , 22, --- , Z, from a normal population having an 
unknown mean 6o and unknown variance o*, suppose that the statistician is asked to say 
whether the unknown mean is >c or <c where c is a given constant (which is supposed 
henceforth to be zero), or to say that he would rather reserve judgement on the matter. 
In the present problem (which was suggested by Professor R. C. Bose as a modification 
of the problems considered in ‘The Problem of the Greater Mean,” [R. R. Banapur anpD 
H. Rossins, Annals of Math. Stat., Vol. 21 (1950), pp. 469-487]), reserving judgement is 
considered to be undesirable, and the possibility of doing so is admitted only for the pur- 
pose of reducing the probability of the statistician making an incorrect assertion. For 
any procedure d which associates each sample with one of the three decisions ‘‘assert 
6 > 0’, “assert @ < 0’, and “reserve judgement”’, let a(d | 60, ¢) = Pr.(‘‘incorrect as- 
sertion”’ using d | 00,0), b(d | 60, 0) = Pr.(“reserve judgement” using d | 6c, c), and set 
a(d| 6) = sup, {[a(d | @0,¢) + a(d| — 60, «)]/2}, 


B(d | 6) = sup. {[b(d | 60, ¢) + b(d | — 60, o)}/2}. 


The class of procedures {d?} is defined as follows: for any r,0 <r < «,d? = “assert @ > 
0 if # > 7s, assert @ < O if # < —rs, and reserve judgement otherwise’, where = n-'Z' 2; 
and s? = n~'2i(z; — #)*. One of the results obtained concerning the class {d+} is as follows. 
Corresponding to any d there exists ad; such that a(d; | 0) < a(d | @) and B(ds | @) < B(d | @) 
for all 6. In particular, given p, (0 < p < 4), there (evidently) exists a r(p), 
(0 < r(p) <~), such that supe {a(d4») | @)} = p, and if d is any other procedure such that 
supe {a(d| 6)| < p, then 8(d| @) > 8(d+») | @) for all 6. These results provide a justification 
of the manner in which the two-sided ¢ test of a normal mean is sometimes used in practice. 


24. Consistent Estimate of the Slope of a Linear Structural Relation. J. Nry- 
MAN, University of California, Berkeley, anp CHARLEs M. Srein, University 
of Chicago. 


Let Z, denote the system of 8n independent pairs of measurements (Xi, Yis), fori = 
1, 2,--- ,n and k = 1, 2, --- , 8, of two nonobservable random variables & and 9%. = @ 
cosec8 — cots, where a and 8 are constants. Variable & is nonnormal. It is assumed 
that any nonnormal components of the errors of measurement Xi, — & and Yix — nie are 
mutually independent, independent of & and of the normal components of the errors. The 
normal components of errors may be correlated but as a pair are independent of £ . For 
every n = 4, let m(n) be the greatest integer not exceeding +/n. Let A(n) = x/(m(n) — 1) 
and b,.; = — 7/2 + (j — 1)A(n), forj = 1, 2, --- , m(n). For every b, | b| S #/2 and for 





142 ABSTRACTS 


i=1,2,--:,nlet Ai = exp {[-4[(Xun — Xe + Xs — Xun) cos b+ (Yun — Yo + Y 

— Yx) sin bP — 4(Xun — Xie + Xis — Xis)*}], Bi = exp [-4(Yun — Yio + Yor — Yis)*} 

C; = exp {[-4(¥Ya — Yia + Yio — Yir)*}, Di = exp {[-—3(Yis — Yis + Yes — Yos)*}, and 
finally, G(b, Z,) = [Z5_, ABs — 2C; + D,)]/n. Let g(Z,) be the smallest of the m(n) 
values of the function G(b, Z,) computed for bai , baz, +++ , Oamcny and let 7(Z,) denote 
the smallest of the b,; for which G(b,; , Z,) = g(Z,). THEOREM. As n— ~, the function 
T(Z,,) thus defined is a consistent estimate of 8. The present problem grew out of the prob- 
lem of identifiability of 68 studied by Olav Reiersgl (Econometrica, Vol. 18 (1950), pp. 
375-389). The results obtained here represent a generalization of the previous results of 
one of the authors presented at the I.M.S. meeting in Boulder, Colorado, as the Second 
Rietz Memorial Lecture, September, 1949. 


25. A Remark on Almost Sure Convergence. Micuet Lo&ve, University of 
California, Berkeley. 


A criterion for almost sure convergence is given. It contains criteria of Kolmogorov, 
Marcinkiewicz, and P. Lévy. 


26. A Significance Test for Differences Among Ranked Treatments in an An- 
alysis of Variance. D. B. DuNcAN, Virginia Polytechnic Institute. 


Given a set of n treatment means (or totals) 2 , 22, -°-- , 2,, it is often desired to 
decide whether each of the differences x; — 2; is significant, that is, whether each of the 
hypotheses Hiy; > ui, ?,j = 1,2,--+,n, i # j can be accepted. A test is obtained for 
this purpose under the conditions which usually apply or are taken to apply in many analy- 
ses of variance, namely that x, , 2: , --: , 2, is a random sample from n normal populations 
with means uw , we, °** , Hn, Tespectively, and a common unknown variance o* for which 
the common form of independent estimate s*? based on nz degrees of freedom is available. 
In approaching the problem the complete Wald multiple decision function form of analysis 
is found to be too unwieldy for a general case and is waived in favor of a simpler set of 
requirements. These state that an a level test should provide likelihood ratio tests as 
closely as possible for each of the ,C, hypotheses that any combination of r of the treat- 
ment means are equal. Also satisfactory upper limits should be placed on the significance 
level of the whole test with respect to each of these particular ,C, hypotheses. The test 
obtained satisfies the given requirements better than other currently available procedures. 
It consists of a fairly simple sequence of range-like tests followed by variance tests which 
are presented in detail together with examples. 


27. On Information and Sufficiency. 8. KuiiBack, George Washington Uni- 
versity, AND R. A. LerBLer, Washington, D. C. 


For probability spaces (X, S, ui), 7 = 1, 2, and probability measures A, wu: , uw: absolutely 
continuous with respect to each other in pairs, f; , i = 1, 2, is defined by 


ui(E) = | fi(z) d(x) forall EHeS. 
E 


Then I,.2(E) = [1/u(E | fi(x)[log fi(z) — log fo(x)] @d(x) for w(E) > 0, and Ih2(E) = 
. 


0 for ui(Z) = 0, is defined as the mean information for discrimination between H,; and H: 
per observation from E ¢ S for uw: , where H/; is the hypothesis that z is selected from the 


population with probability measure yu; . J:2(Z), the divergence between the populations 
in E, is defined as J;.2(F) + I2.,(F) or 


Ji2(E) = | [fi(r) /wi(E) — feo(x)u2(E)) [log fi(z) — log fe(x)} dd(z). 
k 





ABSTRACTS 143 


Properties of J and J are considered and the relations of J to the information notions of 
Fisher, Shannon and Wiener and J to Mahalanobis’ generalized distance are noted. In 
particular it is proved that a transformation 7’ never increases J;..(X) and a necessary and 
sufficient condition that 7 leave J;.2(X) unchanged is that 7’ be a sufficient statistic. 


28. Asymptotic Theory of Certain “Goodness of Fit” Criteria Based on Stochas- 
tic Processes. T. W. ANDERSON, Columbia University, anp D. A. DarLING, 
University of Michigan. 

The statistical problem treated is that of testing the hypothesis that a sample of n 
independent, identically distributed random variables have the common continuous dis- 


tribution function F(z). If F,(x) is the empirical cumulative distribution function and 
¥(x) is some nonnegative weight function (0 S$ z S 1), we consider 


K, = ni sup_ecece {| F(z) — Fa(x) | VIF (z)]} 


and Wi =n / [F(xz) — F,(x)]}*¥|F(x)] dF(x). For suitable choices of y these tests have 


been considered by Kolmogorov, Cramér, von Mises, Smirnov, and others. A unified method 
for calculating the limiting distributions of K, and W? is developed by reducing them 
to corresponding problems in stochastic processes, which in turn lead to more or less classi- 
cal eigen-value and boundary value problems for special classes of differential equations. 
For certain weight functions we give explicit limiting distributions. For y = 1 we obtain, 
e.g., the Kolmogorov distribution and the « distribution of Smirnov and von Mises for 
K, and W? , respectively. By courtesy of the numerical analysis section of the Rand Cor- 
poration a tabulation of the «* distribution has been prepared. (This work was supported 
by the Rand Corporation.) 


29. The Effect of Preliminary Tests of Significance on the Size and Power of 
Certain Tests of Univariate Linear Hypotheses with Special Reference to 
the Analysis of Variance. (Preliminary Report). Roserr E. BecuHorer, 
Columbia University. 


Let X,,°*:,Xq; Yi,-°*+, Ye; Z1,°++ , Z be normally and independently distrib- 
uted with means 0, --- , 0; wi, °** , wre 3 m1, *** » %&, respectively, and variance o*. The 
null hypothesis is Ho:v, = --- = », = 0. The standard test (7:) of Ho is an F-test involv- 
ing 2; _, Zizi. x’ . If uw = +--+ = pw, = 0, a more powerful test (7:) of Ho is an F-test 
involving Di. , = (=3_,X<+ Y;)- However, if ae Py o* should be large, 72 would 
have low power. When the statistician believes (based on past experience) that the y’s 
equal zero, but wishes to protect himself against the possibility that they do not, he can 
use a preliminary F-test (70), i.e., he pools (uses 7':) or does not pool (uses 7) accordingly 
as Z,., ¥;/Zsas X: is less than or greater than some preassigned constant. The power of 
the composite test [T = (To plus 7; or T:)] depends on q, r, s; the levels of significance 
ao , a1 , ae associated with 7> , 7, , T: , respectively; and \. = Bici u;/2o* (the nuisance 
parameter) and A; = Poo. v ‘2c, Formulae are derived for the size (Type I error) and 
power of T. The behavior of the size and power as a function of \2 and ); is characterized. 
It is shown that certain choices of ao , ai , «2 yield tests 7 which have desirable properties. 


(Part of this work was carried out under the sponsorship of the Office of Naval Research.) 


30. The Exact Distribution of the Extremal Quotient. E. J. Gumpet, New York, 
AND L. H. Hersacu, Columbia University. 
The distribution of the extremal quotient q (the ratio of the largest value z, 


to the smallest z; of n independent observations taken from the same distribution), is 
obtained in four stages, three special cases: (1) 7, 2 0,27, 2 0,q¢ 21. (2)%5 0,2, 8 0, 





144 NEWS AND NOTICES 


0sq81. (3) 25 0,2, 2 0,q S 0, culminating in the general case: (4) —w: S 1 S 2% S 
w2, —we/a, S g < «©. The common procedure in the first three cases is to integrate out 
the extreme from the joint distribution of one extreme and the extremal quotient. Geo- 
metric considerations give the appropriate regions of integration. The general case is 
obtained by a composition of cases (3), (2), and (1). For symmetrical initial distributions 
there exist only two branches which join at q = 1, and the probability function may be 
written in a symmetrical form. When n = 2, the distribution of q for a symmetrical dis- 
tribution is symmetrical about zero and invariant under a reciprocal transformation, and 
if the initial distribution possesses no moments and does not vanish at x = 0, the density 
of probability becomes infinite at ¢q = 0. The distribution of q is not affected by changes 
in scale but is very sensitive to changes in origin. For a uniform distribution, the extremal 
quotient of a nonnegative variate has just the opposite qualities of the extremal quotient 
of a nonpositive variate. For variates changing sign, the extremal quotient is asymp- 
totically negative. 


31. The Distributions of the t and F Statistics for a Class of Nonnormal Popu- 
lations. Rate A. Brap.ey, Virginia Polytechnic Institute. 


Series expansions of the cumulative distribution functions of ¢ and of F in powers of 
t-! and F- are obtained. The general method of derivation presented is valid for popula- 
tions with density functions, f(u), such that f(u) > 0, f(u) is continuous, and has con- 
tinuous derivatives for all values, —~ < u < «. The coefficients of terms in these expan- 
sions are reduced from integrals, of multiplicity equal to the sample size, to products of 
coefficients, common to all populations of the class defined above, and integrals of no 
greater multiplicity than the number of groups of observations in the sample. Selected 
values of the common coefficients are given as well as illustrative examples for the Cauchy 
and “‘squared hyperbolic secant’’ population. 


32. Note on the Behavior of the Characteristic Function of a Random Variable 
at Zero. M. Rosensuatr, University of Chicago. 

Let X be a random variable with characteristic function ¢(z). Let X, = X when | X | < 
n/« and let X, = 0 when | X | > n"/*, The following theorems are proved: (1) 1 — ¢(z) = 
o(|z|*),0<a< 1, atz = Oif and only if n-Pr( | X | > n”*) = o(1). (2) 1 — o(z) = 
o(|z|*),1 <a < 2, at z = Oif and only if n-Pr( | X | > n*) = o(1) and E(X,) = o(1). 
The results are obtained by making use of W. Feller’s necessary and sufficient conditions 


for the weak law of large numbers (see W. Fetter, Acta Univ. Szeged, Vol. 8 (1937), pp. 
191-201). 


Ce 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest. 


Personal Items 


Dr. R. R. Bahadur, who received his Ph.D. in mathematical statistics from 
the University of North Carolina in June, 1950, is now an instructor in the Com- 
mittee on Statistics of the University of Chicago. 

Dr. T. A. Bancroft, Associate Professor of Statistics, lowa State College, has 
been appointed Head of the Department of Statistics and Director of the Statis- 
tical Laboratory at Iowa State College. 





NEWS AND NOTICES 145 


Dr. Geoffrey Beall, formerly with the Research Laboratory of Swift & Co., 
Chicago, Illinois, has accepted a professorship in statistics in the Department of 
Mathematics, University of Connecticut, Storrs. 

Dr. Archie Black has resigned his position as government statistician to devote 
more time to his work as Treasurer and Mathematical Consultant with the 
Mechanical Research Corporation of Chicago. 

Professor David Blackwell of Howard University has been appointed Visiting 
Professor of Statistics at Stanford University for 1950-51. 

Mr. Nils Blomqvist of the University of Stockholm has been appointed as In- 
structor in Mathematics and Statistics at Boston University for the academic 
year of 1950-51. 

Professor Roque G. Carranza has started a course of twelve lectures on funda- 
mentals of probability and statistics at Colegio Libre de Estudios Superiores 
in Buenos Aires which is a private nonprofit institution for higher studies. 

Dr. Andrew L. Comrey, formerly an Assistant Professor of Psychology at the 
University of Illinois, has accepted a position as Assistant Professor of Psychol- 
ogy and Public Administration, Department of Psychology, University of South- 
ern California, Los Angeles. 

Dr. D. R. Cowan, who has been conducting many research projects for the 
coal industry and for various steel, food, paint, appliance, oil and other com- 
panies, has been appointed Professor of Marketing in the School of Business 
Administration, University of Michigan, Ann Arbor. 

Dr. Robert E. Greenwood has been recalled to active duty by the Navy De- 
partment and is on leave from the Department of Applied Mathematics of the 
University of Texas. 

Dr. Leo A. Goodman, who was a Social Science Research Council Research 
Training Fellow in the Mathematical Statistics Section of Princeton University, 
is now an Assistant Professor teaching statistics in the Sociology Department of 
the University of Chicago. 

Mr. W. C. Hoffman is a thesis fellow at the Institute for Numerical Analysis 
during 1950-51. 

Professor Paul Horst has returned to the Department of Psychology, Univer- 
sity of Washington, Seattle. He took a year’s leave of absence during 1949-1950 
to be Director of Research in the Educational Testing Service at Princeton, New 
Jersey. 

Dr. Stanley L. Isaacson, who was a Naval Research Assistant at Columbia 
University, has been appointed Assistant Professor of Statistics in the Statistical 
Laboratory at Iowa State College, effective September, 1950, where he will be 
employed in research and teaching. Dr. Isaacson spent last summer working as 
a Mathematical Statistician with the Operations Research Office in Washington, 
D.C. 

Dr. Allyn W. Kimball has resigned his position as Experimental Statistician 
at the USAF School of Aviation Medicine, Randolph Air Force Base, and will 
be on the Mathematics Panel at the Oak Ridge National Laboratory. 





146 NEWS AND NOTICES 


Mr. William Kruskal is now associated with the Committee on Statistics at the 
University of Chicago. 

Professor E. L. Lehmann is on leave of absence from the University of Cali- 
fornia, Berkeley, for the academic year 1950-51. He is teaching at Columbia 
University during the first semester and at Princeton University the second. 

Mr. Garnet McCreary was awarded at the Commencement June 15, 1950, 
the degree of Doctor of Philosophy in Statistics at Iowa State College. His dis- 
sertation was entitled “Cost Functions for Sample Surveys.’’ He has been ap- 
pointed Assistant Professor in the Department of Mathematics, University of 
Manitoba, Winnipeg, Canada, effective September 1950, where he will be em- 
ployed in teaching, consulting and research. 

Professor William B. Michael, who held the position of Assistant Professor of 
Psychology at Princeton University for the past three years, accepted an appoint- 
ment as Associate Professor of Psychology at San Jose State College commenc- 
ing September, 1950. 

Dr. Stanley W. Nash, formerly Associate in Mathematics at the University of 
California, Berkeley, has accepted an appointment as Assistant Professor of 
Mathematics and Research Consultant in Statistics at the University of British 
Columbia at Vancouver, effective July 1, 1950. 

Dr. H. W. Norton has resigned as statistician of the Accountability Branch 
of the Atomic Energy Commission to become Professor of Agricultural Statistics 
in the Illinois Agricultural Experiment Station and the University of Illinois. 

Dr. A. E. Paull, formerly biometrician for the Grain Research Laboratory, 
Board of Grain Commissioners, Winnipeg, Canada, has accepted a position as 
Associate Statistician in the Department of Statistical Research, Abitibi Power 
& Paper Company, Limited, Toronto, Canada. 

Dr. Paul Peach, formerly Associate Professor at the Institute of Statistics, 
University of North Carolina, is now head of the Data Analysis Branch, Test 
Department at the Naval Ordnance Test Station, China Lake, California. 

Dr. Raymond P. Peterson, formerly a research fellow at the Institute for Nu- 
merical Analysis, National Bureau of Standards, Los Angeles, received his doc- 
torate from University of California at Los Angeles in June and has accepted a 
position as Instructor of Mathematics at the University of Washington, Seattle. 

Dr. P. Ratoosh, formerly a lecturer at Columbia University, is now an Instruc- 
tor in the Department of Psychology at the University of Wisconsin. 

Mr. Joseph S. Rhodes has accepted a position as Mathematical Statistician in 
the office of the United States Air Force Comptroller, Washington, D. C. He is 
acting in the capacity of Mathematical Advisor to the Director of Statistical 
Services on the design of sample surveys. 

Miss Rosemary Savey has accepted a position as Instructor in Statistics and 

tesearch Assistant in the Bureau of Business Research, University of Toledo. 

Dr. Esther Seiden, formerly lecturer and research fellow at the University of 
California, Berkeley, accepted an assistant professorship in the Department of 
Statistics, School of Business Administration, University of Buffalo, New York. 





NEWS AND NOTICES 


Abraham Wald 


Abraham Wald, a Fellow of the Institute, and Mrs. Wald were killed in a 
plane crash in India on December 13, 1950. Professor Wald was born in Cluj, 
Romania, October 31, 1902. His academic training was received at the University 
of Cluj (L.M., 1927) and the University of Vienna (Ph.D., 1931). He was a re- 
search associate of the Austrian Institute for Business Cycle Research until 1938 
when he came to the United States. He had since been associated with Columbia 
University, becoming Professor of Mathematical Statistics and Executive Officer 
of the Department of Mathematical Statistics. Professor Wald made important 
contributions to the fields of mathematical statistics and probability, pure 
mathematics, and mathematical economics. He had written almost 100 papers 
in these fields as well as two books, Sequential Analysis and Statistical Decision 
Functions. 

Professor Wald had been a president of the Institute, a member of the Coun- 
cil of the Institute and of the Editorial Board of the Annals, a vice-president 
and Fellow of the American Statistical Association, and a Fellow of the Econo- 
metric Society. 


Statistics Summer Session at Virginia Polytechnic Institute 


The Department of Statistics, Virginia Polytechnic Institute, will hold a 
special summer session August 8-25, 1951, for graduate students, research 
workers, and technicians in government and industry. Special emphasis will be 


given to statistics in economics and engineering. Several visiting professors will 
participate in the lecturing. For details write to the Department of Statistics, 
Virginia Polytechnic Institute, Blacksburg, Virginia. 


RR 


The following Ph.D. degrees with major in mathematical statistics were 
granted at the University of North Carolina in 1950: 


Name Thesis Minor 

Raghu Raj Bahadur On a Class of Decision Problems in Experimental Statistics and 
the Theory of R Populations Mathematics 

Kenneth A. Bush Orthogonal Arrays Mathematics and Econom- 

ics 

Max Halperin Estimation in Truncated Sampling Experimental Statistics and 
Processes Mathematics 

Sharad-Chandra Construction of Partially Balanced Experimental Statistics 

8. Shrikhande Designs and Related Problems 

Shanti A. Vora Bounds on the Distribution of Chi- Experimental Statistics 

square 





NEWS AND NOTICES 


New Members 


The following persons have been elected to membership in the Institute. 
(September 1, 1950 to November 30, 1950) 


Berger, Richard, M.A. (Columbia Univ.), Statistician-Economist, General Aniline & Film 
Corporation, 230 Park Avenue, New York 17, New York. 

Boll, C. H., B.S. (Stanford Univ.), Graduate Student in Statistics, Stanford University, 
1247 Cowper, Palo Alto, California. 

Chacon, Enrique, S. J., Ph.D. (Univ. of Madrid), Professor of Statistics, University of 
Deusto, Apartado 1, Bilboa, Spain. 

Curcio, F. L., M.S. (Univ. of Pa.), Graduate Student, Department of Mathematical Sta- 
tistics, Columbia University, 559 Broadlawn Terrace, Vineland, New Jersey. 

Derman, Cyrus, A.M. (Univ. of Pa.), Graduate Student, Department of Mathematical 
Statistics, Columbia University, 449 MacDade Boulevard, Collingdale, Pennsyvlania. 

de Finetti, Bruno, Ph.D. (Univ. of Milano), Professor, University of Trieste, via Coroneo 
43, Trieste, Italy. 

Geisser, Seymour. B.A. (City College of N. Y.), Graduate Student, University of North 
Carolina, B-Dormitory, Room 312, Chapel Hill, North Carolina. 

Getchell, B. C., Ph.D. (Univ. of Mich.), Research Analyst, Department of Defense, 903 
N. Wayne St., Apt. $305, Arlington, Virginia. 

Guerreiro, Amaro, B. Litt. (Oxford Univ.), Head of Research Division, Instituto Nacional 
de Estatistica, 48, Av. Marques de Tomar, Lisbon, Spain. 

Haddad, R. K., B.A. (N. Y. Univ.), Teaching Assistant in Psychology, Graduate School 
of Arts and Sciences, New York University, 43-02 63 Street, Woodside, New York. 

Hogg, R. V., Ph.D. (Univ. of Iowa), Assistant Professor, Department of Mathematics and 
Astronomy, State University of Iowa, Iowa City, Iowa. 

Inselmann, E. H., A.B. (Temple Univ.), Graduate Student, Columbia University, 4253 N. 
6th St., Philadelphia 40, Pennsylvania. 

Kitagawa, Tosio, Ph.D. (Tokyo Univ.), Professor of Mathematical Statistics, Kyushu 
University, and Chief, Committee of Research Association of Statistical Sciences, 
Faculty of Science, University of Kyushu, Fukuoka, Japan 

Kurtz, T. E., A.B. (Knox College, Ill.), Research Assistant, Mathematics Department, 
Princeton University, Fine Hall, Box 708, Princeton, New Jersey. 

Lanteli, Gunnar, Fil. Kand. (Lund), Actuary of Férsaikringsaktiebolaget Hansa Stockholm 
7, Sweden. 

Leimbacher, W. R., Dipl. Math. (Switzerland), Teaching Assistant at Federal Institute 
of Technology, 64, Ringstresse, Zurich 57, Switzerland. 

McCall, C. H., Jr., A.B. (George Wash. Univ.), Assistant in Statistics and Graduate Stu- 
dent, George Washington University, 6701-44th Street, Chevy Chase 15, Maryland. 

Matérn, Bertil, Fil. Lic. (Stockholm Univ.), Assistant Professor, Swedish Forest Research 
Institute, Lappkarrsvagen 47, Stockholm 50, Sweden. 

Matthias, R. H., A.B. (Amherst College), Graduate Student, Department of Mathematical 
Statistics, University of North Carolina, Purefoy Road, Chapel Hill, North Carolina. 

Ortiz, C. L. B., Ingeniero Civil (Paris), Asesor Analista, Direccion Navional de Estadistica, 
Contraloria General de la Republica, Bogota, Colombia. 

Poch, F. A., Licencia do en Ciencias (Univ. of Madrid), Official of Instituto Nacional de 
Estadistica; Specialist of Section of Methodology; Assistant Professor of Mathe- 
matical Statistics, University of Madris, Federico Rubio, 106, Madrid, Spain. 

Sastry, N.S.R., Ph.D. (London School of Econ.), Officer and Director of Statistics, Depart- 
ment of Research and Statistics, Reserve Bank of India, Post Bag No. 1036, Bombay 
1, India. 





REPORT OF CHICAGO MEETING 149 


Skibinsky, Morris, B.S. (City College of N. Y.), Graduate Student, Department of Math- 
ematical Statistics, University of North Carolina, Room $23 B-Dormitory, Chapel 
Hill, North Carolina. 

Sullivan, J. R., M.A. (Georgetown Univ.), Instructor in Mathematics, Clemson College, 
South Carolina (on leave); and Graduate Student, University of North Carolina, 
18-C, Lennox, Chapel Hill, North Carolina. 

Torngqvist, Leo, Ph.D. (Abo Akademi), Professor in Statistics, Institute of Statistics, Uni- 
versity of Helsinki, Helsinki, Suomi (Finland). 

von Guerard, Hermann W., Director, Statist. Amts., Lambertusstr. 1, Dusseldorf, Ger- 
many. 

Wunsche, Gunther, Dipl. Math. (Tech. Univ. of Dresden), Chefmathematiker und Prok- 
urist, Universitatsdozent, Lenbachplatz 4, Munich 2, Germany. 


eR 


REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 


The thirteenth Annual Meeting and forty-fifth meeting of the Institute of 
Mathematical Statistics was held in Chicago, December 27-29, 1950. Head- 
quarters were at the Congress Hotel. Sessions were held at the Congress Hotel, 
Roosevelt College and the Palmer House. One or more sessions were held in 
conjunction with one or more of the following organizations: the American Sta- 
tistical Association, the Econometric Society, the American Association of Uni- 
versity Teachers of Insurance, the American Economic Association, the American 
Farm Economic Association, the American Marketing Association, the Ameri- 
can Psychological Association, the American Public Health Association, the 
American Society for Quality Control (Chicago Section), the Association for Com- 
puting Machinery, the Biometric Society (Eastern North American Region), 
the Population Association of America, and the Psychometric Society. The fol- 
lowing 266 members of the Institute attended: 


Helen Abbey, F.S. Acton, Beatrice Aitchison, A. A. Alchian, J. E. Alman, R. L. Ander- 
son, T. W. Anderson, E. E. Ard, K. J. Arnold, K. J. Arrow, Max Astrachan, G. J. 
Auner, R. R. Bahadur, E. W. Bailey, J. C. Bain, T. A. Bancroft, E. W. Barankin, Walter 
Bartky, W. D. Baten, R. E. Bechhofer, B. M. Bennett, Richard Berger, Z. W. Birnbaum, 
C. I. Bliss, Isadore Blumen, C. R. Blyth, A. H. Bowker, R. A. Bradley, Dorothy 8. Brady, 
A. E. Brandt, M. F. Bresnahan, C. A. Bridger, Jean Bronfenbrenner, I. D. J. Bross, R. W. 
Burgess, L. D. Calvin, J. M. Cameron, E. 8. Cansado, A. G. Carlton, C. G. Carlyle, O. 8. 
Carpenter, Maria Castellani, F. R. Cella, Herman Chernoff, Randolph Church, W. G. 
Cochran, C. H. Coombs, Jerome Cornfield, J. H. Cover, Gertrude M. Cox, C. C. Craig, 
E. L. Crow, 8. L. Crump, E. E. Cureton, Cuthbert Daniel, D. A. Darling, Besse B. Day, 
F. R. Del Priore, D. B. DeLury, W. E. Deming, B. W. Dempsey, Lucile Derrick, J. L. 
Doob, H. F. Dorn, D. B. Duncan, C. W. Dunnett, David Durand, A. M. Dutton, P. 8. 
Dwyer, Churchill Eisenhart, Lila Elveback, Benjamin Epstein, H. P. Evans, W. D. Evans, 
W. T. Federer, Robert Ferber, J. W. Fertig, Evelyn Fix, M. M. Flood, E. J. Frank, L. R. 
Frankel, D. A. S. Fraser, H. A. Freeman, H. C. Fryer, R. P. Gage, M. A. Girshick, Mary 
A. Goins, L. A. Goodman, Roe Goodman, B. G. Greenberg, 8. W. Greenhouse, J. A. Green- 
wood, L. E. Grosh, F. A. Gross, F. E. Grubbs, Harold Gulliksen, L. 8. Gunlogson, John 
Gurland, R. K. Haddad, R. J. Hader, K. W. Halbert, Max Halperin, F. J. Halton, M. H. 
Hansen, H. H. Harman, T. E. Harris, H. L. Harter, Mina Haskind, P. M. Hauser, W. C. 
Healy, Jrv., F. M. Hemphill, J. L. Hodges, Jr., William Hodgkinson, Jr., R. G. Hoffmann, 





i 
i 
: 
} 
| 


150 REPORT OF CHICAGO MEETING 


J. F. Hofmann, R. V. Hogg, H. B. Horton, D. G. Horvitz, Harold Hotelling, E. E. House- 
man, W. G. Howard, C. J. Hoyt, Leonid Hurwicz, P. E. Irick, 8. L. Isaacson, J. E. Jack- 
son, C. M. Jaeger, A. T. James, E. H. Jebe, R. J. Jessen, H. L. Jones, L. B. Kahn, Leo 
Katz, L. S. Kellogg, H. J. Kelly, Oscar Kempthorne, Nathan Keyfitz, A. W. Kimball, Jr., 
E. P. King, L. R. Klein, L. A. Knowler, Lila F. Knudsen, C. F. Kossack, R. L. Kozelka, 
kK. H. Kramer, William Kruskal, Solomon Kullback, T. E. Kurtz, R. A. Leibler, H. O. 
Levine, G. J. Lieberman, Gilbert Lieberman, J. E. Lieberman, R. F. Link, Michel Loéve, 
G. F. Lunger, W. G. Madow, C. J. Maloney, John Mandel, Nathan Mantel, E. S. Marks, 
Mary Marquardt, Jacob Marschak, Margaret P. Martin, Pat Maxwell, Jr., K. O. May, 
P. J. MeCarthy, G. E. McCreary, D. C. McCune, P. W. McGann, F. E. McIntyre, Brock- 
way McMillan, Margaret Merrill, Robert Mirsky, A. M. Mood, R. H. Morris, J. E. Morton, 
L. E. Moses, Jack Moshman, Frederick Mosteller, B. D. Mudgett, Hugo Muench, M. R. 
Neifeld, C. J. Nesbitt, Jerzy Neyman, R. T. Nichols, M. L. Norden, J. I. Northam, H. W. 
Norton, G. B. Oakland, E. G. Olds, P. S. Olmstead, Bernard Ostle, Toby Oxtoby, A. E. 
Paull, M. P. Peisakoff, B. E. Phillips, Frank Proschan, Joan E. Raup, L. J. Reed, J. 8. 
Rhodes, P. R. Rider, B. A. Rojas, C. F. Roos, 8. N. Roy, M. M. Sandomire, F. E. Satter- 
thwaite, L. J. Savage, Henry Scheffé, M. A. Schneiderman, Elizabeth L. Scott, R. H. Shaw, 
R. W. Shephard, Jack Sherman, W. A. Shewhart, I. H. Siegel, Jack Silber, P. B. Simpson, 
Rosedith Sitgreaves, H. F. Smith, J. H. Smith, Milton Sobel, Herbert Solomon, L. D. 
Sommers, F. A. Sorensen, Mortimer Spiegelman, E. W. Stacy, B. R. Stauber, R. G. D. 
Steel, C. M. Stein, H. W. Steinhaus, F. F. Stephan, O. F. Stewart, J. V. Sturtevant, B. R. 
Suydam, Zenon Szatrowski, J. V. Talacko, Dan Teichroew, J. G. C. Templeton, B. J. 
Tepping, D. J. Thompson, G. R. Treanor, A, E. Treloar, J. W. Tukey, G. W. Tyler, S. A. 
Tyler, S. A. Vora, D. F. Votaw, Jr., Helen M. Walker, D. L. Wallace, W. A. Wallis, F. A. 
Weck, Samuel Weiss, M. E. Wescott, Eric Wey], Phillips Whidden, 8.8. Wilks, C. P. Winsor 
J. Wolfowitz, M. A. Woodbury, Holbrook Working, W. J. Youden, R. K. Zeigler. 


At 10 a.m., Wednesday, December 27, 1950 the American Statistical Associa- 
tion joined the Institute in one of two sessions held at that time for contributed 
papers. Albert H. Bowker of Stanford University presided. The following papers 
were presented : 


1. Cost Functions for Sample Surveys. Preliminary Report. Garnet E. McCreary, Uni- 
versity of Manitoba and Iowa State College. 
2. On a Preliminary Test for Pooling Mean Squares in the Analysis of Variance. A. E. 
Paull, Abitibi Power & Paper Company, Limited, Toronto, Canada. 
. Estimation for Sub-Sampling Designs Employing the County as a Primary Sampling 
Unit. Emil H. Jebe, Iowa State College and North Carolina State College. 
. The Probability Distribution of the Number of Isolates in a Social Group. Leo Katz, 
Michigan State College. 
. Estimating Population Size Using Sequential Sampling Tagging Methods. Leo A. 
Goodman, University of Chicago. 
. Application of the Distribution of a Linear Form in Chi-square Variates. Arthur 
Grad and Herbert Solomon, Office of Naval Research, Washington, D. C. 
. A Large Sample t-statistic which is Insensitive to Nonrandomness. (By Title.) John 
E. Walsh, The Rand Corporation. 


At the second session for contributed papers held at 10 a.m., Wednesday, 
December 27, 1950, K. J. Arnold of the University of Wisconsin presided. The 
following papers were presented: 


8. Conditional Expectation and Conver Functions. E. W. Barankin, University of Cali- 
fornia, Berkeley. 





REPORT OF CHICAGO MEETING 151 


. Transformation Parameters. Melvin P. Peisakoff, The Rand Corporation. 


. A Generalization of the Neyman-Pearson Fundamental Lemma. Henry Scheffé, 
Columbia University. 


. Nonparametric Estimation V, Sequentially Determined Statistically Equivalent 
Blocks. D. A. 8. Fraser, University of Toronto. 


. A Bayes Approach to a Quality Control Model. M. A. Girshick and Herman Rubin, 
Stanford University. 


. On the Translation Parameter Problem for Discrete Variables. David Blackwell, 
Stanford University. 

. On Ratios of Certain Algebraic Forms. Robert V. Hogg, State University of Iowa. 

. The Economics of Sampling. (By Title.) Norman Rudy, Sacramento State College. 

. Exact Tests of Serial Correlation Using Noncircular Statistics. (By Title.) G. 8. 
Watson, University of Cambridge, and J. Durbin, London School of Economics. 

. Stochastic Difference Equations with a Continuous Time Parameter. Preliminary 
Report. (By Title.) S. G. Ghurye, University of North Carolina. 

8. Nonsequential Problems in the Case of k Hypotheses. Preliminary Report. (By Title.) 

Herman Chernoff, University of Tlinois. 


Also at 10 a.m., Wednesday, December 27, 1950, the Institute joined the 
American Statistical Association (Biometries Section) and the Biometric Society 
(Eastern North American Region) in a session on Statistical Problems in Radio- 
Biology. A. E. Brandt of the United States Atomic Energy Commission was 
chairman. The papers presented were Gene Mutations in Populations by Bruce 
Wallace of the Long Island Biological Laboratory, Long-Term Radiation Experi- 
ment in Dogs by S. Lee Crump of the University of Rochester, and Meiabolism 
of Labeled Carbon Compounds by Hardin B. Jones of the University of California 
at Berkeley. The papers were discussed by H. Fairfield Smith of the University 
of North Carolina and C. W. Sheppard of the Oak Ridge National Laboratory. 

At 2 p.m., Wednesday, December 27, 1950, the American Statistical Associa- 
tion and the American Society for Quality Control (Chicago Section) joined the 
Institute in a session devoted to an address, Statistical Control, by W. A. Shew- 
hart of the Bell Telephone Laboratories. E. G. Olds of the Carnegie Institute of 
Technology presided. 

Also at 2 p.m., Wednesday, December 27, 1950, the Institute joined the Ameri- 
can Statistical Association (Biometrics Section), the American Farm Economic 
Association, the Biometric Society (Eastern North American Region), and the 
Psychometric Society in a session on Theory of Variance Components. W. J. You- 
den of the National Bureau of Standards was chairman. The papers presented 
were The Present Status of Variance Component Analysis by S. Lee Crump of the 
University of Rochester, Testing a Linear Relation Among Variances by William 
G. Cochran of Johns Hopkins University, and Application to Regression and to 
Errors of Measurement by John W. Tukey of Princeton University. The papers 
were discussed by A. M. Mood of the Rand Corporation. 

Also at 2 p.m., Wednesday, December 27, 1950, the Institute joined the Ameri- 
can Statistical Association and the American Association of University Teachers 
of Insurance in a session on Developments in Actuarial Science. Cecil J. Nesbitt 
of the University of Michigan was chairman. The papers presented were Survey 
of Theoretical Developments by Charles A. Spoerl of the Aetna Life Insurance 





152 REPORT OF CHICAGO MEETING 


Company, and Survey of Practical Applications by E. A. Lew and Frank Weck 
of the Metropolitan Life Insurance Company. The papers were discussed by 
Alfred Guertin of the American Life Convention, Chicago. 

At 4 p.m., Wednesday, December 27, 1950, the American Statistical Associa- 
tion and the Econometric Society joined the Institute in a session devoted to a 
Half-Century of Progress address, Multivariate Analysis, by T. W. Anderson 
of Columbia University. M. A. Girshick of Stanford University presided. 

Also at 4 p.m., Wednesday, December 27, 1950, the Institute joined the Ameri- 
can Statistical Association (Biometrics Section), American Society for Quality 
Control (Chicago Section), and the Biometric Society (Eastern North American 
Region) in a session on Precision of Measurements. W. Edwards Deming of the 
Division of Statistical Standards was chairman. The papers presented were The 
Specification of Precision of Measurements by Churchill Eisenhart of the National 
Bureau of Standards, The Estimation of Precision of Measurements by Frank E. 
Grubbs of the Aberdeen Proving Grounds, and Estimate of Precision of Textile 
Instruments by John C. Whitwell of Princeton University. The papers were dis- 
cussed by H. Fairfield Smith of the University of North Carolina. 

Also at 4 p.m., Wednesday, December 27, 1950, the Institute joined the 
American Statistical Association (Business and Economic Statistics Section), 
the American Economic Association, and the Econometric Society in a session 
on Analysis of Choices Involving Risk. Jacob Marschak of the Cowles Commis- 
sion for Research in Economics was chairman. The papers presented were Alter- 
native Approaches to Theory of Choice in Risk-Taking Situations by Kenneth J. 
Arrow of Stanford University and An Experimental Measurement of Utility by 
Frederick Mosteller of Harvard University. The papers were discussed by Ar- 
men Alchian of the University of California at Los Angeles and Franco Modigli- 
ani of the University of Illinois. 

At 10 a.m., Thursday, December 28, 1950, the American Statistical Associa- 
tion joined the Institute in a session devoted to a Half-Century of Progress ad- 
dress, Non-Parametric Inference, by A. M. Mood of the Rand Corporation. P. 
S. Dwyer of the University of Michigan presided. 

Also at 10 a.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association and the American Society for Quality Control (Chi- 
cago Section) in the first session on Engineering. Frederick J. Halton, Jr., of 
Deere & Company, was chairman. The paper presented was Statistics in Produc- 
tion and Inspection by Edwin G. Olds of the Carnegie Institute of Technology. 
The paper was discussed by Warren E. Jones of Desplaines, Illinois, and Charles 
A. Bicking of Hercules Powder Company. 

Also at 10 a.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association (Section on the Training of Statisticians), the Ameri- 
can Psychological Association, and the Psychometric Society in a session on 
Statistical Literacy in the Social Sciences. The address was given by Helen M. 
Walker of Columbia University. Philip M. Hauser of the University of Chicago 
was chairman. 





REPORT OF CHICAGO MEETING 153 


Also at 10 a.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association (Biometrics Section) and the Biometric Society (East- 
ern North American Region) in a session on Statistical Methods in Pharmacology 
and Immunology. Lloyd C. Miller, director of Revision of the United States 
Pharmacopeia, was chairman. The papers presented were Collaborative Bio- 
assays by Lila F. Knudsen, Food and Drug Administration, and Statistical 
Methods in Immunology by Herbert C. Batson of the Army Medical Center. The 
papers were discussed by Everett Welker of the American Medical Association 
and George Hunt of Bristol Laboratories. 

At 2 p.m., Thursday, December 28, 1950, the American Statistical Association 
and the Econometric Society joined the Institute in a session devoted to an 
address, Some Recent Advances in the Theory of Decision Functions, by Jacob 
Wolfowitz of Columbia University. J. L. Doob of the University of Illinois pre- 
sided. 

Also, at 2 p.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association and the Population Association of America in a ses- 
sion on Developments in United States Census Taking. W. F. Ogburn of the 
University of Chicago was chairman. The papers presented were Role of Re- 
search in Census Taking by Morris Hansen, Bureau of the Census, Evaluation of 
Census Results by Eli Marks, Bureau of the Census, and Census Programs and. 
Operations by A. Ross Eckler, Bureau of the Census. The papers were discussed 
by Nathan Keyfitz of the Dominion Bureau of Statistics, Ottawa, and Vergil D. 
Reed of the J. Walter Thompson Company. 

Also at 2 p.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association, the American Psychological Association, and the 
Psychometric Society in a session on Statistical Problems and Psychological 
Theory. Allen Edwards of the University of Washington was chairman. The 
papers presented were Statistical Problems and Psychological Scaling by Clyde H. 
Coombs, University of Michigan, and Statistical Problems and Learning Theory 
by Kenneth W. Spence, State University of Iowa. The papers were discussed by 
Harold P. Bechtoldt, lowa City, and Harold Gulliksen of the Educational Test- 
ing Service. 

Also at 2 p.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association (Biometrics Section) and the Biometric Society (East- 
ern North American Region) in a session on Applications of Variance Compo- 
nents. G. W. Snedecor of Iowa State College was chairman. The papers presented 
were Variance Components as a Tool for the Analysis of Sample Data by Walter 
A. Hendricks, United States Department of Agriculture, Consistency of Esti- 
mates of Variance Components by R. E. Comstock and H. F. Robinson of North 
Carolina State College, and Use of Components of Variance in Preparing Schedules 
for the Sampling of Baled Wool by J. M. Cameron of the National Bureau of 
Standards. The papers were discussed by Walter T. Federer of Cornell Univer- 
sity. 

Also at 2 p.m., Thursday, December 28, 1950, the Institute joined the Ameri- 





154 REPORT OF CHICAGO MEETING 


can Statistical Association and the American Society for Quality Control (Chi- 
cago Section) in the second session on Engineering. W. Edwards Deming of the 
Division of Statistical Standards was chairman. The papers presented were 
Statistics in Engineering Research and Development by Ellis R. Ott of Rutgers 
University, and Statistical Developments in South Africa by H. 8. Sichel of the 
Educational Testing Service. 

At 4 p.m., Thursday, December 28, 1950, the Psychometric Society and the 
Econometric Society joined the American Statistical Association (Section on 
the Training of Statisticians) and the Institute in a session devoted to a Half- 
Century of Progress address and a Special Invited Paper, Statistical Inference, by 
Jerzy Neyman of the University of California at Berkeley. S. N. Roy of the 
University of North Carolina presided. 

Also at 4 p.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association (Biometrics Section, Business and Economic Statis- 
tics Section), the American Farm Economic Association, and the Biometric 
Society (Eastern North American Region) in a session on Sample Survey Tech- 
niques. W. F. Callander of Gainesville, Florida, was chairman. The papers 
presented were Double Sampling and the Curtis Impact Study by D. 5. Robson 
of Cornell University and Arnold J. King of National Analysts, Inc., Philadel- 
phia, Approaches to Agricultural Price Statistics by F. E. McVay and Henry 
Tucker of North Carolina State College, and Problems in Rural Surveys by R. L. 
Anderson and A. L. Finkner of North Carolina State College. The papers were 
discussed by B. R. Stauber of the Division of Agricultural Price Statistics and 
B. J. Tepping of the Bureau of the Census. 

Also at 4 p.m., Thursday, December 28, 1950, the Institute joined the Ameri- 
can Statistical Association, the Association for Computing Machinery, and the 
Psychometric Society in a session devoted to a Round Table: What Can High 
Speed Electronic Computing Equipment Do For and To Statistics? William G. 
Madow of the University of Illinois was moderator. Electronic Engineer Sam 
N.,Alexander of the National Bureau of Standards and Expert User Byron 
Schreiner of the A. C. Nielson Company, Chicago, were the speakers. The papers 
were discussed by Howard C. Grieves of the Bureau of the Census and John J. 
Finelli of the Metropolitan Life Insurance Company. 

At 10 a.m., Friday, December 29, 1950, the American Statistical Association 
joined the Institute in a session devoted to a Half-Century of Progress address, 
Surveys, by W. G. Madow of the University of Illinois. F. F. Stephan of Prince- 
ton University presided. 

Also at 10 a.m., Friday, December 29, 1950, the Institute joined the Eco- 
nometric Society in a session on Problems of Incorrect and Incomplete Specifica- 
tion. Merrill M. Flood of the Rand Corporation was chairman. The papers 
presented were Some Specification Problems and Their Applications to Economet- 
ric Models by Leonid Hurwicz of the University of Illinois and the Cowles Com- 
mission for Research in Economics, and An Approach to Effects of Non-Normality 





REPORT OF CHICAGO MEETING 


in Tests of Significance by William Kruskal of the University of Chicago. The 
papers were discussed by T. W. Anderson of Columbia University and John W. 
Tukey of Princeton University. 

Also at 10 a.m., Friday, December 29, 1950, the Institute joined the American 
Statistical Association, the Psychometric Society, and the American Psychologi- 
cal Association in a session on Factor Analysis as a Statistical Tool. The address 
was given by L. L. Thurstone of the University of Chicago. The paper was dis- 
cussed by E. E. Cureton of the University of Illinois. Harold Gulliksen of the 
Educational Testing Service was chairman. 

Also at 10 a.m., Friday, December 29, 1950, the Institute joined the American 
Statistical Association and the American Society for Quality Control (Chicago 
Section) in a session on Statistics in the Physical Sciences. The address was given 
by Walter Bartky of the University of Chicago. The paper was discussed by J. 
L. Doob of the University of Illinois. 8. 8. Wilks of Princeton University was 
chairman. 

Also at 10 a.m., Friday, December 29, 1950, the Institute joined the American 
Statistical Association (Biometrics Section), the Biometric Society (Eastern 
North American Region), and the American Public Health Association in a ses- 
sion on Statistical Methods in Medicine. Hugo Muench of Harvard University 
was chairman. The papers presented were A Stochastic Model of Relapse, Death 
and other Risks Following a Treatment by Evelyn Fix and J. Neyman of the Uni- 
versity of California, Berkeley, The Design of Physiological and Clinical Investi- 
gations by Donald Mainland of New York University and J. W. Hopkins of the 
National Research Council, Ottawa, and Discriminatory Analysis by Joseph L. 
Hodges, Jr., of the University of California, Berkeley. The papers were dis- 
cussed by Samuel W. Greenhouse of the National Cancer Institute. 

At 2 p.m., Friday, December 29, 1950, the American Statistical Association 
joined the Institute in a session devoted to an address, Elements of Information 
Theory, by Brockway McMillan of the Bell Telephone Laboratories. Solomon 
Kullback of George Washington University presided. 

Also at 2 p.m., Friday, December 29, 1950, the Institute joined the Economet- 
ric Society and the American Economic Association in a session on Collection and 
Use of Survey Data. J. Neyman of the University of California at Berkeley was 
chairman. The papers presented were Sample Surveys of Households: A New 
Tool in Econometrics by Lawrence R. Klein of the University of Michigan, and 
Use of Sample Surveys of Business Expectations and Plans by Franco Modigliani 
of the University of Illinois. The papers were discussed by Paul F. Lazarsfeld of 
Columbia University. 

Two sessions for contributed papers were held at 4 p.m., Friday, December 
29, 1950. At one of these Oscar Kempthorne of Iowa State College presided. The 
following papers were presented : 


19. The Moments of a Multinormal Distribution after One-sided Truncation of Some or 
All Coordinates. Z. W. Birnbaum and Paul L. Meyer, University of Washington. 





REPORT OF CHICAGO MEETING 


. An Algorithm for the Determination of all Solutions of a Two-Person Zero Sum Game 
with a Finite Number of Strategies. H. Raiffa, G. L. Thompson, and R. M. Thrall, 
University of Michigan. 

. A Note on the Convolution of Uniform Distributions. Edwin G. Olds, Carnegie In- 
stitute of Technology. 

. On the Consistency of Certain Estimates of the Linear Structural Relation. Elizabeth 
L. Scott, University of California, Berkeley. 

. A 3-Decision Problem Concerning the Mean of a Normal Population. R. R. Bahadur, 
University of Chicago. 

. Consistent Estimate of the Slope of a Linear Structural Relation. J. Neyman, Uni- 
versity of California, Berkeley, and Charles M. Stein, University of Chicago. 

. A Remark on Almost Sure Convergence. Michel Loéve, University of California, 
Berkeley. 


. A Significance Test for Differences Among Ranked Treatments in an Analysis of 
Variance. D. B. Duncan, Virginia Polytechnic Institute. 


At the fourth session for contributed papers, the second at 4 p.m., Friday, 
December 29, 1950, W. D. Baten of Michigan State College presided. The fol- 
lowing papers were presented: 


27. On Information and Sufficiency. 8. Kullback, George Washington University, and 
R. A. Leibler, Washington, D. C. 


28. Asymptotic Theory of Certain ‘“‘Goodness of Fit’’ Criteria Based on Stochastic Proc- 


esses. T. W. Anderson, Columbia University, and D. A. Darling, University of 
Michigan. 


. The Effect of Preliminary Tests of Significance on the Size and Power of Certain Tests 
of Univariate Linear Hypotheses with Special Reference to the Analysis of Variance. 
Preliminary Report. Robert E. Bechhofer, Columbia University. 

. The Exact Distribution of the Extremal Quotient. E. J. Gumbel, New York, and L. 
H. Herbach, Columbia University. (The paper was read by J. A. Greenwood.) 

31. The Distributions of the t and F Statistics for a Class of Nonnormal Populations. Ralph 
A. Bradley, Virginia Polytechnic Institute. 

32. Note on the Behavior of the Characteristic Function of a Random Variable at Zero. 
M. Rosenblatt, University of Chicago. (Introduced by L. J. Savage.) 


Also at 4 p.m., Friday, December 29, 1950, the Institute joined the American 
Statistical Association (Section on the Training of Statisticians) in an R. A. 
Fisher Survey session. Gertrude Cox of the University of North Carolina was 
chairman. The papers presented were Revolution of Methods in Experimentation 
by W. J. Youden of the National Bureau of Standards, and The Impact of 
R. A. Fisher on Statistics by Harold Hotelling of the University of North Caro- 
lina. 

Also at 4 p.m., Friday, December 29, 1950, the Institute joined the American 
Statistical Association (Business and Economic Statistics Section), the American 
Marketing Association, the American Psychological Association, and the Psy- 
chometric Society in a session on Measurement of Opinion. William G. Cochran 
of Johns Hopkins University was chairman. The papers presented were [mpli- 
cations for Factor Analysis of Lazarsfeld’s Latent Structure Theory by Bert J. 
Green of Princeton University, discussed by Paul F. Lazarsfeld of Columbia 
University, A New Approach to Thurstone’s Method of Scaling by Frederick Mos- 





MINUTES OF THE ANNUAL MEMBERSHIP MEETING 157 


teller of Harvard University, discussed by L. L. Thurstone of the University of 
Chicago, and A Critical Analysis of Guttman’s Theory of Principal Components 
in Altitude Measurement by Philip J. McCarthy of Cornell University. 

A meeting of the 1950 Council was held on Wednesday, December 27, 1950, 
at 12:00 noon, Professor J. L. Doob presiding. The Annual Business Meeting 
was held on Wednesday, December 27, 1950, at 7:00 p.m., Professor J. L. Doob 
presiding. A meeting of the 1951 Council was held on Friday, December 29, 
1950, at 12:00 noon, Professor P. 8. Dwyer presiding. The report of the Annual 
Business Meeting appears elsewhere in this issue. 

K. J. ARNOLD 
Associate Secretary 


(Rn 


MINUTES OF THE ANNUAL MEMBERSHIP MEETING, 
CHICAGO, DECEMBER 27, 1950 


The meeting was called to order at 7:10 p.m. by President J. L. Doob. The 
annual reports of the President, Editor, and Secretary-Treasurer were read. They 
are printed elsewhere in this issue. 

The Acting Secretary moved that Article 2 of the By-Laws of the Institute 
be amended so that the first two sentences read: ‘‘Members shall pay ten dollars 
at the time of admission to membership and shall receive the full current volume 
of the Official Journal. Thereafter Members shall pay ten dollars annual dues, 
of which seven dollars shall be for a subscription to the Official Journal.” and 
that exception D be amended to read: ““Any Member who resides outside the 
United States and Canada shall pay seven dollars annual dues.”’ The motion 
carried. 

The President asked for instructions from the membership as to the procedure 
to be followed in filling the unexpired term of Abraham Wald. It was voted that 
the candidate for the Council receiving the fifth largest number of votes be de- 
clared elected for a term of one year. 

J. W. Tukey moved that it is the sense of this meeting that a four day annual 
meeting is preferable to a three day meeting even if this means meeting alone 
on the fourth day. The motion carried. 

Harold Hotelling moved the adoption of the following resolution: 


Whereas the death of Professor Abraham Wald, who with Mrs. Wald was killed in an 
airplane crash in India, deprives statistics of a vigorous, brilliant, and original con- 
tributor to its fundamental ideas; and 

Whereas the future of statistical methods will be vitally affected by Abraham Wald’s 
introduction of a formalized and accurate method of sequential analysis, and by his work 
on the foundations of statistical inference, including particularly the consideration of 
loss and risk functions, of general decision problems, of the minimax principle and the 
related theory of games, of the nature of the estimation of unknown quantities, and of 
the testing of hypotheses; and 





158 MINUTES OF THE ANNUAL MEMBERSHIP MEETING 


Whereas the efforts of American industry and the military and naval services of supply 
were materially aided in the successful conduct of the Second World War by widespread 
application of Abraham Wald’s work, particularly to the quality control of manufactured 
articles; and 

Whereas his contributions to statistical methods and theory were substantial in such 
varied fields as the foundations of probability, inequalities on distributions in terms of 
moments, the treatment of time series, long cycles resulting from repeated integration, 
tolerance limits, analysis of variance, asymptotic large-sample distributions, and the 
estimation of parameters of stochastic processes; and 

Whereas Abraham Wald contributed also to economics and economic statistics by his 
penetrating studies of equations of production and of general equilibrium, of index num- 
bers of cost of living, and of the determination of indifference loci by means of Engel 
curves; and 

Whereas he made in his earlier career in Europe valuable contributions to pure math- 
ematics in the fields of differential geometry and the axiomatization of metric spaces; 
and 

Whereas he served the American Statistical Association as Vice President and the In- 
stitute of Mathematical Statistics as President and as member of its Council and of the 
Editorial Board of the Annals of Mathematical Statistics; and 

Whereas great inspiration is to be derived from the example of Abraham Wald in his 
surmounting of the difficulties caused by the discrimination and restrictions that, in his 
East European environment, denied him the opportunities of the primary and secondary 
schools; in his entrance to the university in his native city of Klausenburg by examina- 
tions for which he had prepared himself; in his graduation with distinction and his bril- 
liant graduate work at the University of Vienna; in his migration to this country at the 
time of the fall of Austria; in his fortitude in enduring the loss of his nearest relatives by 
the Nazi policy of genocide; in his devotion to our science and in his habits of hard work 
which brought a great volume of substantial contributions; and in his ability to be 
friendly and kind under the severest strains; now therefore 

Be it resolved that the American Statistical Association and the Institute of Math- 
ematical Statistics jointly record their deepest sorrow and regret at the untimely passing 
away in middle life of a great contributor, and at the further tragedy that his wife also 
was taken; and that this Association extends its sincere sympathy and good wishes to 
the bereaved relatives and particularly to the two young children who remain. 


The resolution was adopted unanimously by a rising vote. 

The meeting adjourned at 8:05 p.m. 

After the meeting the tellers posted the results of the election as follows: 
President-Elect M. A. Girshick 
Members of the Council for 1951-1953 Harald Cramér 

A. M. Mood 
Jerzy Neyman 
S.S. Wilks 
Member of the Council for 1951 E. L. Lehmann 
K. J. ARNOLD 
Acting Secretary 





REPORT OF THE PRESIDENT 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1950 


The general affairs of the Institute marked time this year, as far as the Presi- 
dent’s office was concerned, but fortunately the Institute’s fine condition is not 
critically dependent on presidential activity. 

The situation at the University of California has caused considerable discus- 
sion and evoked strongly differing opinions from members of the Institute. 
While there was general agreement that a deplorable situation had been created 
by arbitrary actions of the regents, there was no agreement as to what, if any- 
thing, the Institute should do about it. (Several other scientific organizations, 
including the American Mathematical Society, have passed resolutions on the 
subject.) A committee, consisting of Paul S. Dwyer (chairman), M. A. Girshick, 
and Henry Scheffé, was appointed to report to the Council on the facts, and to 
make recommendations as to possible action. 

Institute activities at the year’s end were overshadowed by the news of the 
tragic deaths of Professor Wald and Mrs. Wald in India. It is needless to record 
here the central place of Wald’s work in the progress of statistics in recent years. 
A proper appreciation will be published in a later issue of the Annals. 

Five new Fellows were elected at the Christmas meeting: R. C. Bose, J. L. 
Hodges, Jr., O. Reiersgl, H. Rubin, L. J. Savage. 

The following Nominating Committee was appointed to serve for the year 
1951: 


G. W. Brown, Chairman G. E. Nicholson 
K. J. Arrow H. W. Norton 
E. L. Lehmann J. W. Tukey 


The following is a list of committees of the Institute in 1950: 


. Committee to Encourage Membership outside the United States 
T. W. Anderson, Chairman M. Loéve 
C. C. Hurd J. Marschak 
. Committee on Tabulation 
H, O. Hartley, Chairman C. C. Hurd 
C. I. Bliss A. N. Lowan 
F. W. Dresch W. G. Madow 
C. Eisenhart H. G. Romig 
H. H. Germond L. E. Simon 
3. Committee on the Directory 
J. W. Tukey, Chairman 
C. Eisenhart 
. Committee on Statisticians in the Government Service 
W. E. Deming, Chairman 
C,. Eisenhart 
5. Committee for the Christmas Meeting 
W. G. Madow, Chairman J. E. Morton 
M. H. Hansen J. Wolfowitz 
M. Loéve 





160 REPORT OF THE PRESIDENT 


6. Committee for the 1950 Spring Meeting in North Carolina 

H. Hotelling, Chairman S. B. Littauer 
D. Blackwell D. F. Votaw, Jr. 
H. Geiringer 8S. S. Wilks 

. Midwest Program Committee 
O. Kempthorne, Chairman K. May 
L. Hurwicz L. J. Savage 
W. G. Madow D. R. Whitney 

. West Coast Program Committee 
A. M. Mood, Chairman W. J. Dixon 
Z. W. Birnbaum J. L. Hodges, Jr. 
A. H. Bowker P. G. Hoel 

. Program Committee for the Oak Ridge Meeting 
O. Kempthorne, Chairman H. W. Norton 
A. 8. Householder L. J. Savage 
H. Levene 

. Committee to Look into Less Expensive Possibilities for Printing the Annals 
T. W. Anderson, Chairman P. 8. Dwyer 
W. E. Deming 8. 8. Wilks 

. Committee for Special Invited Papers 
W. G. Madow, Chairman O. Kempthorne 
T. W. Anderson A. M. Mood 

2. Committee to Revive the Statistical Research Memoirs 

H. Scheffé, Chairman C. C. Hurd 
T. W. Anderson G. Kuznets 
W. Bartky 

. Representative of the I. M.S. to the American Association for the Advancement of Science 
J. Neyman 

. Representative of the I. M. S. to the National Research Council, Division of Physical 
Sciences 
W. Bartky 

. Representative of the I. M.S. to the Mathematical Policy Committee 
H. Scheffé 

. Representative of the I. M.S. to the Joint Committee for Development of Statistical A ppli- 
cations in Engineering and Manufacturing 
B. Epstein 

. Representatives of the I. M. S. to the Inter-Society Committee on the Mathematical Train- 
ing of Social Scientists 
W. G. Madow, Chairman 
T. W. Anderson 

. Representative of the I. M. S. to the American Academy of Political and Social Science 
F. F. Stephan 


J. L. Doos 


President 
December 27, 1950 





REPORT OF THE SECRETARY-TREASURER 


REPORT OF THE SECRETARY-TREASURER OF THE 
INSTITUTE FOR 1950 

At the beginning of 1950 the Institute had 1164 members and during the pe- 
riod covered by this report 129 new members (6 of whom were appointed to 
membership by Institutional Members) joined the Institute and 19 were re- 
instated. During 1950 the Institute lost 73 members of which 26 were by resig- 
nation, 46 were cancelled for non-payment of dues, and one member was de- 
ceased. Judging from the information available at this date, the Institute will 
have 1239 members as it starts 1951. 

During the year, a list of statisticians in countries outside of the United States, 
compiled by the Committee to Encourage Membership outside the United 
States headed by T. W. Anderson, was solicited for membership by this office. 
Thirty-three of the new members were obtained primarily because of this solicita- 
tion, and it is possible that some few additional membership applications may 
still result from this campaign. 

Meetings of the Institute held during 1950 included those at Chapel Hill, 
North Carolina on March 17-18; at Chicago, Illinois on April 28-29; at Berkeley, 
California on August 5; and at Chicago on December 27-29, 1950. The Secretary 
wishes to call attention to the excellent work of Professor W.G. Madow, Program 
Chairman for the December, 1950 annual meeting and of the members who 
served as Assistant and Associate Secretaries at these meetings: Professor 
Herbert E. Robbins at Chapel Hill; Professor K. J. Arnold at Chicago in April 
and December; and Professor L. J. Savage at Chicago. 

The following Fellows served as members of the Committee on Fellows: 
Henry Scheffé, Chairman, T. W. Anderson, M. A. Girshick, E. L. Lehmann, 
H. E. Robbins, and F. F. Stephan. 


INSTITUTE OF MATHEMATICAL STATISTICS 
Statement of Condition 
December 31, 1950 


ASSETS 


Dues Receivable 
Subscriptions Recsivable. ..... ..-.+s0cscsassctssecess nS 
Wy aig MN DIE. onc cv ach scccckhauswececdeeieass 


Total Assets 


Amount Due to Printer $2,145.00 
Withholding Tax Payable 106.20 
Miscellaneous Liabilities 75.25 $2,326.45 


Reserve for Dues Advanced 
Reserve for Subscriptions Advanced 
Reserve for Life Members 2,757.50 5,262.16 


Total Liabilities and Reserves 
*Surplus (Excess of Assets over Liabilities) 





* 


ee ee 


162 REPORT OF THE SECRETARY-TREASURER 


“Surplus is not adjusted for inventory of back issues estimated at a nominal value of 
621.00 (67¢ per issue). 

Neither is the surplus adjusted for the reserve for reprinting back issues. There are four 
issues of which we have less than 25 copies on hand that will have to be printed early in 1951 
at an estimated cost of $1,300, and there are four other issues of which we have between 26 
and 50 copies which will probably have to be reprinted later in the year. 


$! 


Revenue and Expense Statement 
For the year ending December 31, 1950 
Revenues 
Dues Revenue : $9, O87. 
Subscriptions Revenue 6% ; 3,875.48 
Sale of Back Issues 5,465.8 
Interest Earned on Bonds ‘ wa 100. 
Miscellaneous Revenue 124.94 $18,653.93 


x Penses 
Printing of Annals Current $8,774.8 
Reprinting of Back Issues 1,313. 
Salary Expense 3,000. 


Miscellaneous Printing of Stationery and Postage ; 891.77 
Contributions to American Mathematical Society... cont 244.13 
Miscellaneous Office !;xpense 229.55 
Editorial Expense ws 200.00 
Meeting Expense : ; 77.45 


Binding Expense 35.00 $14,765.91 


Excess of Revenues over Expenses $ 3,888.02 
Excess of Liabilities over Assets December 31, 1949 2,027.19 


Excess of Assets over Liabilities December 31, 1950 $ 1,860.83 


It has been our practice to set up an amount equal to all life membership 
payments as a liability and to hold all these funds in reserve until the death of 
the member—after which his payment is released to the general fund. There were 
no new life membership payments during 1950 nor were there any deaths among 
life members. The total number of members therefore remains as 32. 

In the last annual report, the membership was advised that the accounting 
system of the Institute was going to be completely re-organized and placed on a 
modern basis. The form of the preceding report results from this change. We no 
longer take credit for prepaid dues and subscriptions for the coming year in our 
report of revenue and expenses for the year just closed. We have followed the 
advice of our accounting consultants in not including among our assets the very 
non-liquid inventory of back issues of the Annals. 

During 1950, we had a remarkable sale of back issues amounting to $5465.81, 
an increase of approximately $2100 over the previous year and $2500 over 1948. 
Furthermore, we reprinted only four issues during the year, whereas we re- 
printed twelve issues in each of the two preceding years. This extremely favor- 
able conjunction of items more than compensated for the excess of our normal 





REPORT OF THE EDITOR 163 


operating cost over our normal operating income from dues and subscriptions. 
It is very unlikely that such a situation will prevail again, at least in the imme- 
diate future. The present international situation will probably cut our back 
number sales appreciably. Furthermore, we now have twelve issues in relatively 
short supply which may have to be reprinted in 1951. We were fortunate in 1950, 
but it would be expecting too much to count on a windfall from back numbers 
to take up the slack resulting from inadequate dues and subscription rates. 
Car. H. Fiscuer 


Secretary-Treasurer 
December 20, 1950 


a  —— 


REPORT OF THE EDITOR OF THE ANNALS FOR 19650 


The past year has seen a new editorial organization for the Annals instituted. 
The Constitution of the Institute of Mathematical Statistics adopted in 1948 
provides for the election of Associate Editors by the Council. In the new organi- 
zation, the Associate Editors not only collaborate with the Editor on matters of 
policy but also assume a large share of the responsibility for consideration of 
manuscripts submitted to the Annals. In establishing policy and procedure, the 
Editorial Committee has relied heavily on the study made by the Committee on 
Editorial Policy of the Annals. 

The Editorial Committee wishes to acknowledge the cooperation of the pre- 
vious Editor, 8. 8. Wilks, in the inauguration of the new editorship. The front 
cover of the Annals now bears recognition of Professor Wilks’ accomplishments 
during his twelve years as Editor of the Annals, in accordance with the resolu- 
tion passed by the Institute of Mathematical Statistics at its 1949 membership 
meeting. 

The 1950 volume of the Annals contained 56 papers of which 22 were notes. 
The number of pages printed, 624, was about the same as in the several pre- 
ceding years. The need for an increased number of pages for the Annals con- 
tinues. Unfortunately, during the past year the cost of printing has risen approxi- 
mately 10%. It is unavoidable that the budget for the Annals be increased. 

“Fundamental Limit Theorems of Probability Theory” by M. Loéve, pub- 
lished in the September, 1950, issue, was the first paper invited by the Special 
Invited Papers Committee. It is expected that by the invitation of this com- 
mittee more expository and review papers will be provided for the Annals. 

On behalf of the Editorial Committee, the Editor takes this opportunity to 
acknowledge the generous refereeing assistance of the following: F. C. Andrews, 
F. J. Anscombe, Kenneth Arrow, E. W. Barankin, Robert Bechhofer, Agnes 
Berger, Z. W. Birnbaum, C. Blyth, Albert Bowker, D. G. Chapman, H. Cher- 
noff, Randal H. Cole, Allen T. Craig, C. C. Craig, D. A. Darling, W. J. Dixon, 
H. F. Dodge, 8. G. Ghurye, H. R. J. Grosch, Frank E. Grubbs, Leon Herbach, 
J. L. Hodges, Jr., W. Hoeffding, E. L. Kaplan, J. L. Kelley, William Kruskal, 





164 PUBLICATIONS RECEIVED 


P. J. McCarthy, Paul Meier, R. J. Monroe, R. B. Murphy, G. E. Noether, Ed- 
ward Paulson, Melvin Peisakoff, John Riordan, H. G. Romig, Herman Rubin, 
L. J. Savage, W. L. Scott, E. Seiden, R. G. D. Steel, Charles Stein, D. F. Votaw, 
Jr., John E. Walsh, Ransom Whitney. 

The Editor is indebted to David Bruce Hanchett and Jack Laderman for 
preparation of manuscripts for the printer and to Miss Jean Hanson for other 
editorial and office assistance. 

T. W. ANDERSON 
Editor 
December 8, 1950 


——— 


PUBLICATIONS RECEIVED 


The Editor has received a number of books from publishers for review pur- 
poses. Because of the present shortage of space in the Annals due to the large 
number of papers submitted and because several other journals carry on a 
broad reviewing service (particularly Mathematical Reviews and the Journal of 
the American Statistical Association), the Editorial Committee has decided that 
at this time Annals space cannot be devoted to reviews of these books. For the 
information of the readers, the publications received will be listed. 


Acceptance Sampling (A Symposium), American Statistical Association, Washington, D.C., 
1950, iv + 155 pp., $1.50. 

Anuario Estadistico de Espafia, (Instituto Nacional de Estadistica) Presidencia del Go- 
bierno, Madrid, 1950, lv + 898 pp. 

FEe.Luer, WILui1aM, An Introduction to Probability Theory and Its Applications, Vol. 1, John 
Wiley and Sons, Inc., New York, 1950, xii + 419 pp., $6.00. 

GEBELEIN, H., Zahl und Wirklichkeit, Grundsztige einer Mathematischen Statistik, 2nd ed., 
Quelle and Meyer, Heidelberg, 1949, xii + 430 pp. 

Measurement and Prediction, (Studies in Social Psychology in World War II, Vol. 4), 
Princeton University Press, Princeton, 1950, x + 756 pp., $10.00. 

Moop, ALEXANDER McFar.ang, Introduction to the Theory of Statistics, McGraw-Hill Book 
Company, Inc., New York, 1950, xiii + 433 pp., $5.00. 

NEyYMAN, J., First Course in Probability and Statistics, Henry Holt and Company, New York, 
1950, ix + 350 pp., $3.50. 

Table of Bessel Functions Yo(z) and Y,(z) for Complex Arguments, (Computation Laboratory, 
National Bureau of Standards) Columbia University Press, New York, 1950, xl + 427 
pp., $7.50. 





BIOMETRIKA 
A Journal for the Statistical Study of Biological Problems 


Volume 37 Contents Parts 3 and 4, December 1950 


1. A simple stochastic epidemic. By N. T. J. BAILEY. 2. On the Fisher-Behrens test. By G. A. 
BARNARD. 3. The incomplete Beta Function as a contour integral and a quickly converging series for 
its inverse. By M. E. WISE. 4. On the levels of significance of the incomplete Beta Function and the F- 
distribution. By L. A. AROIAN. 5. On the generalized second limit-theorem in the calculus of probabili- 
ties. By D.G. KENDALL and K.8. RAO. 6. A note on the cumulants of Kendali’s S-distribution. By 
H, SILVERSTON E.7. The comparison of percentages in matched samples. By W. G. COCHRAN 
8. The distribution of the variance-ratio in random samples of any size drawn from non-normal universes. 
By A.K.GAYEN. 9. The exact partition of x* and its application to the problem of the pooling of small 
expectations. By H.O. LANCASTER. 10. Use of range in analysis of variance. By H. O. HARTLEY. 
11, On the comparison of estimators. By N. L. JOHNSON. 12. A rapid method for ascertaining serial 
lag correlation. By G. D. GIBSON. 13. Tables of the x*-integral and of the cumulative Poisson distribu- 
tion. By H.O. HARTLEY and E.8.PEARSON. 14. The maximum F-ratio as a short-cut test for hetero- 
geneity of variance. By H.O. HARTLEY. 15. On the sequential t-test. By 8. RUSHTON. 16. 
Properties of some tests in sequential analysis. By A.G. BAKER. 17. The unbiassed estimation of he- 
terogeneous error variances. By A.8.C. EHRENBERG. 18. Sampling theory of the negative binomial 
and logarithmic series distributions. By F.J. ANSCOMBE. 19. On questions raised by the combinations 
of tests based on discontinuous distributions. By E.8. PEARSON. 20. Significance of difference between 
the means of two non-normal samples. By A. K. GAYEN. 21. Testing for serial correlation in least squares 
regression—I. By J. DURBIN and G.S.WATSON, 22. Distribution of ‘Student’-Fisher’s t in samples 
from compound normal functions. By H. HYRENIUS. 23. MISCELLANEA. 24. REVIEWS. 


The subscription price, payable in advance, is 45s. inland, 54s. expert (per volume including postage). ae 

should be wn to Biometrika and sent to “The Secretary, Biometrika Office, Department of 

pe College, London, W.C. 1.” All foreign cheques must be in cunting and drawn - a bank 
ving & agency. 


SKANDINAVISK 
AKTUARIETIDSKRIFT 


1950 - Parts 1 - 2 


Contents 
GERHARD ARFWEDSEN......... Some Problems in the Collective Theory of Risk 
E. Kivikosk1 Ein Vernachlissigtes Interpolationsverfahren 
E. Zwinaat. A Study of the Dependence of the Premium on the Rate of Interest 
Eric MIcHALuP On Inverse Linear Interpolation 


H. V. MunsamM An Attempt to Classify Life Tables 


Annual subscription: 10 Swedish Crowns (Approx. $2.00). 
Inquiries and orders may be addressed to the Editor, 
SKARVIKSVAGEN 7, DJURSHOLM (SWEDEN) 





ROYAL STATISTICAL SOCIETY 


SPECIAL REPRINTS 


SYMPOSIUM ON STOCHASTIC PROCESSES: 
Stochastic Processes and Statistical Physics, J. E. Moyar 
Some Evolutionary Stochastic Processes, M.S. BARTLETT 
Stochastic Processes and Population Growth, D.G. KENDALL 
(With Discussion on the papers) 
Price, post free 12s. 6d. 


TABLES OF SEQUENTIAL INSPECTION SCHEMES TO 
CONTROL FRACTION DEFECTIVE, F. J. ANscomBe 
Price, post free, 2s. 6d. 


These papers, published in the Journal of the Royal Statistical Society, 
1949 have now been issued in reprint form. Copies may be obtained 
direct from 


The Royal Statistical Society, 
4, Portugal! Street, 
London, W.C.2. 





MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
ature of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others. 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 
Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
531 West 116th Street, New York City 27 





SANKHYA 
The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. X, Parts 1 and 2, 1950 


Statistical Analysis of Some Physiological Experiments 

ARTHUR LINDER AND ETIENNE GRANDJEAN 
On a Problem of Two Dimensional Probability.... R. C. Bose 
Univariate and Multivariate Analysis as Problems in rare of Composite 
Hypotheses—I ..8. N. Roy 

The Theory of Fractional Replication i in Factorial Experiments 
C. RapHAKRISHNA Rao 
A Note on the Fractional Replication of Factorial Arrangements....K.S. BANERJEE 
Two ar rm Systematic eer and the Associated Stratified and Random 
A. C. Das 
harps oie A. GHosH 


Note on Slesisemntal Survey of the United Provinces, 1941: 
A Statistical Study, Part I1I—Anthropological Observations” 
G. MoRGENSTIERNE 
United Nations Economic and Social Council Sub-Commission on Statistical 
Sampling: Report to the Statistical Commission 
Book Reviews 


Annual! subscription : 30 rupees 
Inquiries and orders may be dium to the 
Editor, Sankhy&, Presidency College, Calcutta, India. 


ECONOMETRICA 


Journal of the Econometric Society 


Contents of Vol. 19, January, 1951, include: 


R. M. GoopwIn 
The Nonlinear Accelerator and the Persistence of Business Cycles 

JAMES N. MORGAN... Consumer Substitutions between Butter and Margarine 
STEPHEN ENKE 

Equilibrium among Spatially Separated Markets: Solution by Electric Analogue 
Report of the Berkeley Meeting, August 1-5, 1950 
Report of the Harv ard Meeting, August 31-September r 5, 1950 
Report of the Council for 1950 
Treasurer’s Report 
Election of Fellows, 1950 
Announcements 


Published Quarterly Subscription to Nonmembers: $9.00 per year 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics. 


Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in 
applying for membership should be addressed to William B. Simpson, Secretary, The Econometric So- 
ciety, The University of Chicago, Chicago 37, Illinois, U.S.A. 


ingen MKS 





JOURNAL OF THE December 1950 


AMERICAN STATISTICAL ASSOCIATION Vol. 45 No. 252 
1108 16th Street, N. W., Washington 6, D. C. 


Who Are the Unemployed?. . .... Pat M. Havser AND Rosert B. PEARL 
Some Sampling Simplified.......... .Joun W. Tukey 
The Effectiveness of Quality Control Cha ...Leo A. AROIAN AND Howarp LEVENE 
Two-Choice Selection Irwin Bross 
Operations Analysis and the Theory of Games: An Advertising Example..LEonarp GILLMAN 
Design of Experiments for Most Precise Slope Estimation or Linear Extrapolation 
CUTHBERT DANIEL AND NICHOLAS HEEREMA 
Sequential Sampling from Finite Lots When the Proportion Defective is Small 
J. H. Caunc 
Correction to “Some New Aspects of the Application of Maximum Likelihood to the Calcu- 
lation of the Dosage Response Curve ... JEROME CORNFIELD 
Index of Journal, Volume 45, 1950 (Numbers 249, 250, 251, 252) 
Book Reviews 


The American Statistical Association invites as members all per- 


sons interested in: 
1. development of new theory and method 
2. improvement of basic statistical data 
3. application of statistical methods to practical problems. 











